Skip to content

How Ludwig Works


Ludwig provides an expressive declarative configuration system for how users construct their ML pipeline, like data preprocessing, model architecting, backend infrastructure, the training loop, hyperparameter optimization, and more.

  name: title
  type: text
      type: rnn
      cell: lstm
      num_layers: 2
      state_size: 128
    tokenizer: space_punct
  name: author
  type: category
      embedding_size: 128
    most_common: 10000
  name: description
  type: text
      type: bert
  name: cover
  type: image
      type: resnet
      num_layers: 18

  name: genre
  type: set
  name: price
  type: number
    normalization: zscore

  epochs: 50
  batch_size: 256
    type: adam
    beat1: 0.9
  learning_rate: 0.001

  type: local
  cache_format: parquet

  metric: f1
  sampler: random
      lower: 1
      upper: 5
      values: [0.01, 0.003, 0.001]

See Ludwig configurations for an in-depth reference.

Data type abstractions

Every feature in Ludwig is described by a specific data type. Each data type maps to a specific set of modules that handle preprocessing, encoding, decoding, and post-processing for that type. Vice versa, every module (preprocessor, encoder, decoder) is registered to a specific set of data types that the module supports.

Read more about Ludwig's supported feature types.

ECD Architecture

Ludwig’s core modeling architecture is referred to as ECD (encoder-combiner-decoder). Multiple input features are encoded and fed through the Combiner model that operates on encoded inputs to combine them. On the output side, the combiner model's outputs are fed to decoders for each output feature for predictions and post-processing. Find out more about Ludwig's Combiner models like TabNet, Transformer, and Concat (Wide and Deep learning).

Visualized, the ECD architecture looks like a butterfly and sometimes we refer to it as the β€œbutterfly architecture”.


ECD flexibly handles many different combinations of input and output data types, making the tool well-suited for many different applications.


Take a look at Examples to see how you can use Ludwig for several many different applications.

Distributed training, data processing, and hyperparameter search with Ray

Ludwig on Ray enables users to scale their training process from running on their local laptop, to running in the cloud on a GPU instance, to scaling across hundreds of machines in parallel, all without changing a single line of code.


By integrating with Ray, Ludwig is able to provide a unified way for doing distributed training:

  • Ray enables you to provision a cluster of machines in a single command through its cluster launcher.
  • Horovod on Ray enables you to do distributed training without needing to configure MPI in your environment.
  • Dask on Ray enables you to process large datasets that don’t fit in memory on a single machine.
  • Ray Tune enables you to easily run distributed hyperparameter search across many machines in parallel.