Ludwig can train on any table-like dataset, meaning that every feature has its own column and every example its own row.
In this example, we'll use this Rotten Tomatoes dataset, a CSV file with variety of feature types and a binary target.
Download the data locally here.
Let's take a look at the first 5 rows to see how the data is arranged:
head -n 5 rotten_tomatoes.csv
import pandas as pd df = pd.read_csv('rotten_tomatoes.csv') df.head()
Your results should look a little something like this:
|Deliver Us from Evil||R||Action & Adventure, Horror||117.0||TRUE||Director Scott Derrickson and his co-writer, Paul Harris Boardman, deliver a routine procedural with unremarkable frights.||0|
|Barbara||PG-13||Art House & International, Drama||105.0||FALSE||Somehow, in this stirring narrative, Barbara manages to keep hold of her principles, and her humanity and courage, and battles to save a dissident teenage girl whose life the Communists are trying to destroy.||1|
|Horrible Bosses||R||Comedy||98.0||FALSE||These bosses cannot justify either murder or lasting comic memories, fatally compromising a farce that could have been great but ends up merely mediocre.||0|
|Money Monster||R||Drama||98.0||FALSE||A satire about television that feels like it was made by the kind of people who claim they don't even watch TV.||0|
|Battle Royale||NR||Action & Adventure, Art House & International, Drama, Mystery & Suspense||114.0||FALSE||Battle Royale is The Hunger Games not diluted for young audiences.||1|