├── docker - Ludwig Docker images ├── examples - Configs demonstrating Ludwig on various tasks ├── ludwig - Ludwig library source code │ ├── automl - Configurations, defaults, and utilities for AutoML │ ├── backend - Execution backends (local, horovod, ray) │ ├── benchmarking - Performance benchmarks for training and hyperopt │ ├── combiners - Combiners used in ECD models │ ├── contribs - 3rd-party integrations (MLFlow, WandB, Comet) │ ├── data - Data loading, pre/postprocessing, sampling │ ├── datasets - Ludwig Dataset Zoo: API to download pre-configured datasets. │ ├── decoders - Output feature decoders │ ├── encoders - Input feature encoders │ ├── explain - Utilities for explaining model predictions │ ├── features - Implementations of feature types │ ├── hyperopt │ ├── models - Implementations of ECD, trainer, predictor. │ ├── modules - Torch modules including layers, metrics, and losses │ ├── profiling - Dataset profiles │ ├── schema - The complete schema of the ludwig config.yaml │ ├── trainers │ ├── utils - Various internal utilities used by ludwig python modules │ ├── api.py - Entry point for python API. Declares LudwigModel. │ ├── api_annotations.py - Provides @PublicAPI, @DevelopAPI annotation decorators │ └── cli.py - ludwig command-line tool └── tests ├── integration_tests - End-to-end tests of Ludwig workflows └── ludwig - Unit tests. Subdirectories match ludwig/ structure
The codebase is organized in a modular, datatype / feature centric way. Adding a feature for a new datatype can be done with minimal edits to existing code:
- Add a module implementing the new feature
- Import it in the appropriate registry file i.e.
- Add the new module to the intended registries i.e.
All datatype-specific logic lives in the corresponding feature module, all of which are under
Feature classes provide raw data preprocessing logic specific to each data type in datatype mixin classes, i.e.
Feature mixins contain data preprocessing functions to obtain feature metadata (
dataset-wide operations to collect things like min, max, average, vocabulary, etc.) and to transform raw data into
tensors using the previously calculated metadata (
add_feature_data, which usually work on a dataset row basis).
Output features also contain datatype-specific logic to compute data postprocessing, to transform model predictions back into data space, and output metrics such as loss or accuracy.
Encoders and decoders are modularized as well (they are under
ludwig/decoders/ respectively) so
that they can be used by multiple features. For example sequence encoders are shared by text, sequence, and timeseries
Various model architecture components which can be reused are also split into dedicated modules (i.e. convolutional
modules, fully connected layers, attention, etc...) which are available in
Training and Inference¶
The training logic resides in
ludwig/trainers/trainer.py which initializes a training session, feeds the data, and
executes the training loop. Prediction logic including batch prediction and evaluation resides in
The command line interface is managed by the
ludwig/cli.py script, which imports the other scripts in the
top-level directory which perform various sub-commands (experiment, evaluate, export, visualize, etc...).
The programmatic interface (which is also used by the CLI commands) is available in the
All test code is in the
tests/ directory. The
tests/integration_tests/ subdirectory contains test cases which aim
to provide end-to-end test coverage of all workflows provided by Ludwig.
tests/ludwig/ directory contains unit tests, organized in a subdirectory tree parallel to the
tree. For more details on testing, see Style Guidelines and Tests.
Hyperparameter optimization logic is implemented in the scripts in the
ludwig/utils/ package contains various internal utilities used by ludwig python modules.
ludwig/contrib/ packages contains user contributed code that integrates with external libraries.