Text Classification

This example shows how to build a text classifier with Ludwig. It can be performed using the Reuters-21578 dataset, in particular the version available on CMU's Text Analytics course website. Other datasets available on the same webpage, like OHSUMED, is a well-known medical abstracts dataset, and Epinions.com, a dataset of product reviews, can be used too as the name of the columns is the same.

text class
Toronto Feb 26 - Standard Trustco said it expects earnings in 1987 to increase at least 15... earnings
New York Feb 26 - American Express Co remained silent on market rumors... acquisition
BANGKOK March 25 - Vietnam will resettle 300000 people on state farms known as new economic... coffee
ludwig experiment \
  --dataset text_classification.csv \
  --config_file config.yaml

With config.yaml:

input_features:
    -
        name: text
        type: text
        level: word
        encoder: parallel_cnn

output_features:
    -
        name: class
        type: category