Visual Question Answering
| image_path | question | answer |
|---|---|---|
| imdata/image_000001.jpg | Is there snow on the mountains? | yes |
| imdata/image_000002.jpg | What color are the wheels | blue |
| imdata/image_000003.jpg | What kind of utensil is in the glass bowl | knife |
ludwig experiment \
--dataset vqa.csv \
--config config.yaml
With config.yaml:
input_features:
-
name: image_path
type: image
encoder:
type: stacked_cnn
-
name: question
type: text
encoder:
type: parallel_cnn
output_features:
-
name: answer
type: text
decoder:
type: generator
cell_type: lstm
loss:
type: softmax_cross_entropy