Fine-tuning for tabular classification (TabLLM)
Prompt templates in Ludwig use Python-style placeholder notation, where every placeholder corresponds to a column in the input dataset:
prompt:
template: "The {color} {animal} jumped over the {size} {object}"
When a prompt template like the above is provided, the prompt with all placeholders filled will be used as the text input feature value for the LLM.
Dataset:
| color | animal | size | object |
| ----- | ------ | ---- | ------ |
| brown | fox | big | dog |
| white | cat | huge | rock |
Prompts:
"The brown fox jumped over the big dog"
"The white cat jumped over the huge rock"
Tabular data can be used directly within an LLM fine-tuning setup by constructing a prompt using the columns of the dataset.
For example, here's a configuration that fine-tunes BERT with a binary classifcation head on tabular data with the following column names:
Recency -- months since last donation
Frequency -- total number of donations
Monetary -- total blood donated in c.c.
Time -- months since first donation
Config¶
input_features:
- name: Recency -- months since last donation
type: text
prompt:
template: >-
The Recency -- months since last donation is {Recency -- months since
last donation}. The Frequency -- total number of donations is {Frequency
-- total number of donations}. The Monetary -- total blood donated in
c.c. is {Monetary -- total blood donated in c.c.}. The Time -- months
since first donation is {Time -- months since first donation}.
output_features:
- name: 'y'
type: binary
column: 'y'
preprocessing:
split:
type: fixed
column: split
defaults:
text:
encoder:
type: bert
trainable: true
combiner:
type: concat
trainer:
epochs: 50
optimizer:
type: adamw
learning_rate: 0.00002
use_mixed_precision: true
learning_rate_scheduler:
decay: linear
warmup_fraction: 0.1
ludwig_version: 0.8.dev