Visual Question Answering
| image_path | question | answer | 
|---|---|---|
| imdata/image_000001.jpg | Is there snow on the mountains? | yes | 
| imdata/image_000002.jpg | What color are the wheels | blue | 
| imdata/image_000003.jpg | What kind of utensil is in the glass bowl | knife | 
ludwig experiment \
--dataset vqa.csv \
  --config config.yaml
With config.yaml:
input_features:
    -
        name: image_path
        type: image
        encoder: 
            type: stacked_cnn
    -
        name: question
        type: text
        encoder: 
            type: parallel_cnn
output_features:
    -
        name: answer
        type: text
        decoder: 
            type: generator
            cell_type: lstm
        loss:
            type: softmax_cross_entropy