Configuration
Configuration Structure¶
Ludwig models are configured by a single config with the following parameters:
model_type: ecd
input_features: []
output_features: []
combiner: {}
preprocessing: {}
defaults: {}
trainer: {}
hyperopt: {}
backend: {}
The config specifies input features, output features, preprocessing, model architecture, training loop, hyperparameter search, and backend infrastructure -- everything that's needed to build, train, and evaluate a model:
- model_type: the model variant used for training. Defaults to ECD, which is a neural network based architecture. Also supports LLM (large language model for text generation) and GBM (gradient-boosted machine, a tree based model).
- input_features: which columns from your training dataset will be used as inputs to the model, what their data types are, how they should be preprocessed, and how they should be encoded.
- output_features: the targets we want the model to learn to predict. The data type of the output feature defines
the task (
number
is a regression task,category
is a multi-class classification task, etc.). - combiner: the backbone model architecture that takes as input all encoded input features and transforms them into a single embedding vector. The combiner effectively combines individual feature-level models into a model that can accept any number of inputs. GBM models do not make use of the combiner.
- preprocessing: global preprocessing options including how to split the dataset and how to sample the data.
- defaults default feature configuration. Useful when you have many input features of the same type, and want to apply the same preprocessing, encoders, etc. to all of them. Overridden by the feature-level configuration if provided.
- trainer: hyperparameters used to control the training process, including batch size, learning rate, number of training epochs, etc.
- hyperopt: hyperparameter optimization options. Any param from the previous sections can be treated as a hyperparameter and explored in combination with other config params.
- backend: infrastructure and runtime options, including what libraries and distribution strategies will be used during training, how many cluster resources to use per training worker, how many total workers, whether to use GPUs, etc.
The Ludwig configuration mixes ease of use, by means of reasonable defaults, with flexibility, by means of detailed
control over the parameters of your model. Only input_features
and output_features
are required while all other
fields use reasonable defaults, but can be optionally set or modified manually.
The config can be expressed as a python dictionary (--config_str
for
Ludwig's CLI), or as a YAML file (--config
).
input_features:
-
name: Pclass
type: category
-
name: Sex
type: category
-
name: Age
type: number
preprocessing:
missing_value_strategy: fill_with_mean
-
name: SibSp
type: number
-
name: Parch
type: number
-
name: Fare
type: number
preprocessing:
missing_value_strategy: fill_with_mean
-
name: Embarked
type: category
output_features:
-
name: Survived
type: binary
{
"input_features": [
{
"name": "Pclass",
"type": "category"
},
{
"name": "Sex",
"type": "category"
},
{
"name": "Age",
"type": "number",
"preprocessing": {
"missing_value_strategy": "fill_with_mean"
}
},
{
"name": "SibSp",
"type": "number"
},
{
"name": "Parch",
"type": "number"
},
{
"name": "Fare",
"type": "number",
"preprocessing": {
"missing_value_strategy": "fill_with_mean"
}
},
{
"name": "Embarked",
"type": "category"
}
],
"output_features": [
{
"name": "Survived",
"type": "binary"
}
]
}
Rendered Defaults¶
Ludwig has many parameter options, but with the exception of input and output feature names and types, all of the other parameters are optional. When a parameter is unspecified, Ludwig assigns it a reasonable default value. Ludwig defines "reasonable" to mean that it is unlikely to produce bad results, and will train in a reasonable amount of time on commodity hardware. In other words, Ludwig defaults are intended to be good baseline configs upon which more advanced options can be layed on top.
Here's an example of a minimal config generated from the following command:
ludwig init_config --dataset ludwig://sst2 --target label --output sst2.yaml
input_features:
- name: sentence
type: text
output_features:
- name: label
type: binary
And here is the fully rendered config generated from the following command:
ludwig render_config --config sst2.yaml --output sst2_rendered.yaml
input_features:
- active: true
name: sentence
type: text
column: sentence
proc_column: sentence_jcnVJf
tied: null
preprocessing:
pretrained_model_name_or_path: null
tokenizer: space_punct
vocab_file: null
sequence_length: null
max_sequence_length: 256
most_common: 20000
padding_symbol: <PAD>
unknown_symbol: <UNK>
padding: right
lowercase: false
missing_value_strategy: fill_with_const
fill_value: <UNK>
computed_fill_value: <UNK>
ngram_size: 2
cache_encoder_embeddings: false
compute_idf: false
prompt:
template: null
task: null
retrieval:
type: null
index_name: null
model_name: null
k: 0
encoder:
type: parallel_cnn
skip: false
dropout: 0.0
activation: relu
max_sequence_length: null
representation: dense
vocab: null
use_bias: true
bias_initializer: zeros
weights_initializer: xavier_uniform
should_embed: true
embedding_size: 256
embeddings_on_cpu: false
embeddings_trainable: true
pretrained_embeddings: null
reduce_output: sum
num_conv_layers: null
conv_layers: null
num_filters: 256
filter_size: 3
pool_function: max
pool_size: null
output_size: 256
norm: null
norm_params: null
num_fc_layers: null
fc_layers: null
output_features:
- active: true
name: label
type: binary
column: label
proc_column: label_2Xl8CP
reduce_input: sum
default_validation_metric: roc_auc
dependencies: []
reduce_dependencies: sum
input_size: null
num_classes: null
decoder:
type: regressor
fc_layers: null
num_fc_layers: 0
fc_output_size: 256
fc_use_bias: true
fc_weights_initializer: xavier_uniform
fc_bias_initializer: zeros
fc_norm: null
fc_norm_params: null
fc_activation: relu
fc_dropout: 0.0
input_size: null
use_bias: true
weights_initializer: xavier_uniform
bias_initializer: zeros
loss:
type: binary_weighted_cross_entropy
weight: 1.0
positive_class_weight: null
robust_lambda: 0
confidence_penalty: 0
calibration: false
preprocessing:
missing_value_strategy: drop_row
fallback_true_label: null
fill_value: null
computed_fill_value: null
threshold: 0.5
model_type: ecd
trainer:
validation_field: label
validation_metric: roc_auc
early_stop: 5
skip_all_evaluation: false
enable_profiling: false
profiler:
wait: 1
warmup: 1
active: 3
repeat: 5
skip_first: 0
learning_rate: 0.001
learning_rate_scheduler:
decay: null
decay_rate: 0.96
decay_steps: 10000
staircase: false
reduce_on_plateau: 0
reduce_on_plateau_patience: 10
reduce_on_plateau_rate: 0.1
warmup_evaluations: 0
warmup_fraction: 0.0
reduce_eval_metric: loss
reduce_eval_split: training
t_0: null
t_mult: 1
eta_min: 0
epochs: 100
checkpoints_per_epoch: 0
train_steps: null
eval_steps: null
steps_per_checkpoint: 0
effective_batch_size: auto
batch_size: auto
max_batch_size: 1099511627776
gradient_accumulation_steps: auto
eval_batch_size: null
evaluate_training_set: false
optimizer:
type: adam
betas:
- 0.9
- 0.999
eps: 1.0e-08
weight_decay: 0.0
amsgrad: false
regularization_type: l2
regularization_lambda: 0.0
should_shuffle: true
increase_batch_size_on_plateau: 0
increase_batch_size_on_plateau_patience: 5
increase_batch_size_on_plateau_rate: 2.0
increase_batch_size_eval_metric: loss
increase_batch_size_eval_split: training
gradient_clipping:
clipglobalnorm: 0.5
clipnorm: null
clipvalue: null
learning_rate_scaling: linear
bucketing_field: null
use_mixed_precision: false
compile: false
enable_gradient_checkpointing: false
preprocessing:
sample_ratio: 1.0
sample_size: null
oversample_minority: null
undersample_majority: null
split:
type: random
probabilities:
- 0.7
- 0.1
- 0.2
global_max_sequence_length: null
defaults:
audio:
preprocessing:
audio_file_length_limit_in_s: 7.5
missing_value_strategy: bfill
fill_value: null
computed_fill_value: null
in_memory: true
padding_value: 0.0
norm: null
type: fbank
window_length_in_s: 0.04
window_shift_in_s: 0.02
num_fft_points: null
window_type: hamming
num_filter_bands: 80
encoder:
type: parallel_cnn
skip: false
dropout: 0.0
activation: relu
max_sequence_length: null
representation: dense
vocab: null
use_bias: true
bias_initializer: zeros
weights_initializer: xavier_uniform
should_embed: true
embedding_size: 256
embeddings_on_cpu: false
embeddings_trainable: true
pretrained_embeddings: null
reduce_output: sum
num_conv_layers: null
conv_layers: null
num_filters: 256
filter_size: 3
pool_function: max
pool_size: null
output_size: 256
norm: null
norm_params: null
num_fc_layers: null
fc_layers: null
bag:
preprocessing:
tokenizer: space
missing_value_strategy: fill_with_const
fill_value: <UNK>
computed_fill_value: <UNK>
lowercase: false
most_common: 10000
encoder:
type: embed
skip: false
dropout: 0.0
activation: relu
vocab: null
representation: dense
embedding_size: 50
force_embedding_size: false
embeddings_on_cpu: false
embeddings_trainable: true
pretrained_embeddings: null
use_bias: true
bias_initializer: zeros
weights_initializer: xavier_uniform
output_size: 10
norm: null
norm_params: null
num_fc_layers: 0
fc_layers: null
binary:
decoder:
type: regressor
fc_layers: null
num_fc_layers: 0
fc_output_size: 256
fc_use_bias: true
fc_weights_initializer: xavier_uniform
fc_bias_initializer: zeros
fc_norm: null
fc_norm_params: null
fc_activation: relu
fc_dropout: 0.0
input_size: null
use_bias: true
weights_initializer: xavier_uniform
bias_initializer: zeros
loss:
type: binary_weighted_cross_entropy
weight: 1.0
positive_class_weight: null
robust_lambda: 0
confidence_penalty: 0
preprocessing:
missing_value_strategy: fill_with_false
fallback_true_label: null
fill_value: null
computed_fill_value: null
encoder:
type: passthrough
skip: false
category:
decoder:
type: classifier
fc_layers: null
num_fc_layers: 0
fc_output_size: 256
fc_use_bias: true
fc_weights_initializer: xavier_uniform
fc_bias_initializer: zeros
fc_norm: null
fc_norm_params: null
fc_activation: relu
fc_dropout: 0.0
input_size: null
num_classes: null
use_bias: true
weights_initializer: xavier_uniform
bias_initializer: zeros
loss:
type: softmax_cross_entropy
weight: 1.0
class_weights: null
robust_lambda: 0
confidence_penalty: 0
class_similarities: null
class_similarities_temperature: 0
preprocessing:
missing_value_strategy: fill_with_const
fill_value: <UNK>
computed_fill_value: <UNK>
lowercase: false
most_common: 10000
cache_encoder_embeddings: false
encoder:
type: dense
skip: false
dropout: 0.0
vocab: null
embedding_initializer: null
embedding_size: 50
embeddings_on_cpu: false
embeddings_trainable: true
pretrained_embeddings: null
date:
preprocessing:
missing_value_strategy: fill_with_const
fill_value: ''
computed_fill_value: ''
datetime_format: null
encoder:
type: embed
skip: false
dropout: 0.0
activation: relu
use_bias: true
bias_initializer: zeros
weights_initializer: xavier_uniform
embedding_size: 10
embeddings_on_cpu: false
output_size: 10
norm: null
norm_params: null
num_fc_layers: 0
fc_layers: null
h3:
preprocessing:
missing_value_strategy: fill_with_const
fill_value: 576495936675512319
computed_fill_value: 576495936675512319
encoder:
type: embed
skip: false
dropout: 0.0
activation: relu
use_bias: true
bias_initializer: zeros
weights_initializer: xavier_uniform
embedding_size: 10
embeddings_on_cpu: false
reduce_output: sum
output_size: 10
norm: null
norm_params: null
num_fc_layers: 0
fc_layers: null
image:
preprocessing:
missing_value_strategy: bfill
fill_value: null
computed_fill_value: null
height: null
width: null
num_channels: null
resize_method: interpolate
infer_image_num_channels: true
infer_image_dimensions: true
infer_image_max_height: 256
infer_image_max_width: 256
infer_image_sample_size: 100
standardize_image: null
in_memory: true
num_processes: 1
requires_equal_dimensions: false
encoder:
type: stacked_cnn
skip: false
conv_dropout: 0.0
conv_activation: relu
height: null
width: null
num_channels: null
out_channels: 32
kernel_size: 3
stride: 1
padding_mode: zeros
padding: valid
dilation: 1
groups: 1
pool_function: max
pool_kernel_size: 2
pool_stride: null
pool_padding: 0
pool_dilation: 1
output_size: 128
conv_use_bias: true
conv_norm: null
conv_norm_params: null
num_conv_layers: null
conv_layers: null
fc_dropout: 0.0
fc_activation: relu
fc_use_bias: true
fc_bias_initializer: zeros
fc_weights_initializer: xavier_uniform
fc_norm: null
fc_norm_params: null
num_fc_layers: 1
fc_layers: null
augmentation: []
number:
decoder:
type: regressor
fc_layers: null
num_fc_layers: 0
fc_output_size: 256
fc_use_bias: true
fc_weights_initializer: xavier_uniform
fc_bias_initializer: zeros
fc_norm: null
fc_norm_params: null
fc_activation: relu
fc_dropout: 0.0
input_size: null
use_bias: true
weights_initializer: xavier_uniform
bias_initializer: zeros
loss:
type: mean_squared_error
weight: 1.0
preprocessing:
missing_value_strategy: fill_with_const
fill_value: 0.0
computed_fill_value: 0.0
normalization: zscore
outlier_strategy: null
outlier_threshold: 3.0
computed_outlier_fill_value: 0.0
encoder:
type: passthrough
skip: false
sequence:
decoder:
type: generator
fc_layers: null
num_fc_layers: 0
fc_output_size: 256
fc_use_bias: true
fc_weights_initializer: xavier_uniform
fc_bias_initializer: zeros
fc_norm: null
fc_norm_params: null
fc_activation: relu
fc_dropout: 0.0
vocab_size: null
max_sequence_length: null
cell_type: gru
input_size: 256
reduce_input: sum
num_layers: 1
loss:
type: sequence_softmax_cross_entropy
weight: 1.0
class_weights: null
robust_lambda: 0
confidence_penalty: 0
class_similarities: null
class_similarities_temperature: 0
unique: false
preprocessing:
tokenizer: space
vocab_file: null
sequence_length: null
max_sequence_length: 256
most_common: 20000
padding_symbol: <PAD>
unknown_symbol: <UNK>
padding: right
lowercase: false
missing_value_strategy: fill_with_const
fill_value: <UNK>
computed_fill_value: <UNK>
ngram_size: 2
cache_encoder_embeddings: false
encoder:
type: embed
skip: false
dropout: 0.0
max_sequence_length: null
representation: dense
vocab: null
weights_initializer: uniform
reduce_output: sum
embedding_size: 256
embeddings_on_cpu: false
embeddings_trainable: true
pretrained_embeddings: null
set:
decoder:
type: classifier
fc_layers: null
num_fc_layers: 0
fc_output_size: 256
fc_use_bias: true
fc_weights_initializer: xavier_uniform
fc_bias_initializer: zeros
fc_norm: null
fc_norm_params: null
fc_activation: relu
fc_dropout: 0.0
input_size: null
num_classes: null
use_bias: true
weights_initializer: xavier_uniform
bias_initializer: zeros
loss:
type: sigmoid_cross_entropy
weight: 1.0
class_weights: null
preprocessing:
tokenizer: space
missing_value_strategy: fill_with_const
fill_value: <UNK>
computed_fill_value: <UNK>
lowercase: false
most_common: 10000
encoder:
type: embed
skip: false
dropout: 0.0
activation: relu
representation: dense
vocab: null
use_bias: true
bias_initializer: zeros
weights_initializer: xavier_uniform
embedding_size: 50
embeddings_on_cpu: false
embeddings_trainable: true
pretrained_embeddings: null
output_size: 10
norm: null
norm_params: null
num_fc_layers: 0
fc_layers: null
text:
decoder:
type: generator
fc_layers: null
num_fc_layers: 0
fc_output_size: 256
fc_use_bias: true
fc_weights_initializer: xavier_uniform
fc_bias_initializer: zeros
fc_norm: null
fc_norm_params: null
fc_activation: relu
fc_dropout: 0.0
vocab_size: null
max_sequence_length: null
cell_type: gru
input_size: 256
reduce_input: sum
num_layers: 1
loss:
type: sequence_softmax_cross_entropy
weight: 1.0
class_weights: null
robust_lambda: 0
confidence_penalty: 0
class_similarities: null
class_similarities_temperature: 0
unique: false
preprocessing:
pretrained_model_name_or_path: null
tokenizer: space_punct
vocab_file: null
sequence_length: null
max_sequence_length: 256
most_common: 20000
padding_symbol: <PAD>
unknown_symbol: <UNK>
padding: right
lowercase: false
missing_value_strategy: fill_with_const
fill_value: <UNK>
computed_fill_value: <UNK>
ngram_size: 2
cache_encoder_embeddings: false
compute_idf: false
prompt:
template: null
task: null
retrieval:
type: null
index_name: null
model_name: null
k: 0
encoder:
type: parallel_cnn
skip: false
dropout: 0.0
activation: relu
max_sequence_length: null
representation: dense
vocab: null
use_bias: true
bias_initializer: zeros
weights_initializer: xavier_uniform
should_embed: true
embedding_size: 256
embeddings_on_cpu: false
embeddings_trainable: true
pretrained_embeddings: null
reduce_output: sum
num_conv_layers: null
conv_layers: null
num_filters: 256
filter_size: 3
pool_function: max
pool_size: null
output_size: 256
norm: null
norm_params: null
num_fc_layers: null
fc_layers: null
timeseries:
decoder:
type: projector
fc_layers: null
num_fc_layers: 0
fc_output_size: 256
fc_use_bias: true
fc_weights_initializer: xavier_uniform
fc_bias_initializer: zeros
fc_norm: null
fc_norm_params: null
fc_activation: relu
fc_dropout: 0.0
input_size: null
output_size: null
use_bias: true
weights_initializer: xavier_uniform
bias_initializer: zeros
activation: null
multiplier: 1.0
clip: null
loss:
type: huber
weight: 1.0
delta: 1.0
preprocessing:
tokenizer: space
timeseries_length_limit: 256
padding_value: 0.0
padding: right
missing_value_strategy: fill_with_const
fill_value: ''
computed_fill_value: ''
window_size: 0
encoder:
type: parallel_cnn
skip: false
dropout: 0.0
activation: relu
max_sequence_length: null
representation: dense
vocab: null
use_bias: true
bias_initializer: zeros
weights_initializer: xavier_uniform
should_embed: true
embedding_size: 256
embeddings_on_cpu: false
embeddings_trainable: true
pretrained_embeddings: null
reduce_output: sum
num_conv_layers: null
conv_layers: null
num_filters: 256
filter_size: 3
pool_function: max
pool_size: null
output_size: 256
norm: null
norm_params: null
num_fc_layers: null
fc_layers: null
vector:
decoder:
type: projector
fc_layers: null
num_fc_layers: 0
fc_output_size: 256
fc_use_bias: true
fc_weights_initializer: xavier_uniform
fc_bias_initializer: zeros
fc_norm: null
fc_norm_params: null
fc_activation: relu
fc_dropout: 0.0
input_size: null
output_size: null
use_bias: true
weights_initializer: xavier_uniform
bias_initializer: zeros
activation: null
multiplier: 1.0
clip: null
loss:
type: mean_squared_error
weight: 1.0
preprocessing:
vector_size: null
missing_value_strategy: fill_with_const
fill_value: ''
computed_fill_value: ''
encoder:
type: dense
skip: false
dropout: 0.0
activation: relu
input_size: null
output_size: 256
use_bias: true
bias_initializer: zeros
weights_initializer: xavier_uniform
norm: null
norm_params: null
num_layers: 1
fc_layers: null
hyperopt: null
backend: null
ludwig_version: 0.9.3
combiner:
type: concat
dropout: 0.0
activation: relu
flatten_inputs: false
residual: false
use_bias: true
bias_initializer: zeros
weights_initializer: xavier_uniform
num_fc_layers: 0
output_size: 256
norm: null
norm_params: null
fc_layers: null