Ludwig is the open-source declarative deep learning framework for building, fine-tuning, and deploying custom models — across tabular data, text, images, and audio — without writing training loops.
Most ML projects spend 80% of their time on infrastructure — data loaders, training loops, distributed setup, export pipelines. Ludwig inverts this. You describe your model in YAML. Ludwig builds and runs it.
You get the simplicity of AutoML with the flexibility of raw PyTorch. Expert-level control is always one config field away — but you never have to touch it if you don't need to.
# describe your model in YAML
input_features:
- name: review_text
type: text
encoder:
type: bert
output_features:
- name: sentiment
type: category
trainer:
epochs: 5
learning_rate: 2.0e-5
$ ludwig train --config model.yaml \
--dataset reviews.csv
✓ Preprocessing ── 12,500 rows
✓ Training ──────── epoch 5/5
✓ Test accuracy ─── 94.2%
✓ Saved ─────────── results/
From prototype to production — Ludwig handles the full ML lifecycle without requiring you to write infrastructure code.
Define your entire ML pipeline — preprocessing, encoders, architecture, training, HPO — in a single validated YAML file. No boilerplate training loops.
Mix tabular features, raw text, images, audio, and time series in a single model. Train multiple outputs simultaneously. No other framework makes this as seamless.
SFT, DPO, KTO, ORPO, and GRPO in one framework. LoRA, QLoRA, DoRA, VeRA with 4-bit quantization. Merge multi-adapter models with TIES, DARE, or SVD.
Add a backend: ray line and your local job becomes a distributed Ray cluster job. Zero rewrites, zero surprises.
Built-in HPO with Ray Tune and Optuna. Auto, TPE, GP, and CMA-ES samplers. SQLite or PostgreSQL persistence. No external orchestration needed.
One command to serve your model as a REST API. Export to SafeTensors, ONNX, or torch.export. Prebuilt Docker images for CPU, GPU, Ray. Upload to HuggingFace Hub.
First-class integrations with W&B, MLflow, TensorBoard, Comet ML, and Aim. Auto-generated training reports and visualizations out of the box.
Plug in custom encoders, decoders, combiners, loss functions, and metrics. Use any HuggingFace model as a backbone. Ludwig is a framework, not a cage.
One-line auto_train() finds a strong baseline automatically. Built-in feature importance, model explainability, and rich visualizations help you understand what the model learned.
Ludwig handles data preprocessing for every feature type automatically — no custom pipelines required.
BERT, GPT-2, LLaMA, any HuggingFace model as encoder
Numeric, categorical, binary, set, bag — all handled
ResNet, EfficientNet, DINOv2, ViT — classification & segmentation
Wave2Vec, raw audio encoders, speech recognition
Forecasting with efficient O(window) preprocessing
H3 hexagonal indexing, location features
Dense embedding vectors and learned representations
Automatic cyclic encoding of temporal features
Token sequences, genomic data, protein chains, discrete symbols
Deep SVDD, Deep SAD, DROCC — unsupervised anomaly scoring
Ludwig gives you the complete alignment stack — from supervised instruction tuning to reward-model-free RL — in a single YAML file. Run Llama, Mistral, Qwen, or any HuggingFace model with 4-bit QLoRA on a single consumer GPU.
New in v0.15: GRPO (Group Relative Policy Optimization) for reward-model-free RLHF. Train policy models directly from rule-based or custom reward signals.
model_type: llm
base_model: meta-llama/Llama-3.1-8B
quantization:
bits: 4
adapter:
type: lora
r: 16
alpha: 32
input_features:
- name: instruction
type: text
output_features:
- name: response
type: text
trainer:
type: finetune
epochs: 3
learning_rate: 1.0e-4
batch_size: 1
gradient_accumulation_steps: 16
$ ludwig train --config llm_finetune.yaml \
--dataset "ludwig://alpaca"
$ ludwig serve --model_path results/
input_features:
- name: text
type: text
output_features:
- name: label
type: category
trainer:
epochs: 10
# ↓ add this to go distributed
backend:
type: ray
trainer:
use_gpu: true
num_workers: 8
strategy: fsdp
Ludwig's backend config key transparently handles data sharding, gradient sync, and checkpointing — without changing your model definition.
Train locally with CPU or GPU. Fast iteration, no setup.
One config change. DDP, FSDP, or DeepSpeed — your call.
Native KubeRay integration for production cluster scheduling.
Install Ludwig, pick a use case, run one command. That's it.
input_features:
- name: review_content
type: text
encoder:
type: bert # or: embed, roberta, deberta, xlnet…
output_features:
- name: recommended
type: binary
trainer:
epochs: 5
learning_rate: 2.0e-5
─────────────────────────────────────────────
$ pip install ludwig[text]
$ ludwig train --config text_classifier.yaml --dataset rotten_tomatoes.csv
$ ludwig predict --model_path results/ --dataset new_reviews.csv
$ ludwig serve --model_path results/
model_type: llm
base_model: meta-llama/Llama-3.1-8B-Instruct
quantization:
bits: 4 # 4-bit QLoRA — runs on a single consumer GPU
adapter:
type: lora
r: 16
alpha: 32
prompt:
template: |
### Instruction:
{instruction}
### Response:
input_features:
- name: instruction
type: text
output_features:
- name: response
type: text
trainer:
type: finetune
epochs: 3
learning_rate: 1.0e-4
batch_size: 1
gradient_accumulation_steps: 16
─────────────────────────────────────────────
$ pip install ludwig[llm]
$ ludwig train --config llm_sft.yaml --dataset "ludwig://alpaca"
$ ludwig upload_to_hf_hub --model_path results/ --repo my-org/my-model
model_type: llm
base_model: Qwen/Qwen2.5-7B-Instruct
adapter:
type: lora
quantization:
bits: 4
input_features:
- name: prompt
type: text
output_features:
- name: response
type: text
trainer:
type: grpo # Group Relative Policy Optimization
reward_fn: accuracy # custom or built-in reward signal
num_generations: 8
epochs: 1
learning_rate: 5.0e-6
─────────────────────────────────────────────
$ pip install ludwig[llm]
$ ludwig train --config grpo.yaml --dataset math_problems.csv
# mix any modalities — Ludwig handles the rest
input_features:
- name: product_image
type: image
encoder:
type: dinov2
- name: description
type: text
encoder:
type: bert
- name: price
type: number
- name: brand
type: category
output_features:
- name: category
type: category
- name: is_premium
type: binary # multi-task: two outputs at once
trainer:
epochs: 10
─────────────────────────────────────────────
$ pip install "ludwig[text,vision]"
$ ludwig train --config multimodal.yaml --dataset products.csv
input_features:
- name: image_path
type: image
encoder:
type: efficientnet # or: resnet, vit, dinov2, stacked_cnn
model_variant: b4
use_pretrained: true
trainable: true
output_features:
- name: label
type: category
trainer:
epochs: 20
learning_rate: 1.0e-4
learning_rate_scheduler:
decay: cosine
─────────────────────────────────────────────
$ pip install "ludwig[vision]"
$ ludwig train --config image_classifier.yaml --dataset images.csv
$ ludwig visualize --visualization confusion_matrix --ground_truth test.csv
input_features:
- name: transaction_amount
type: number
- name: merchant_category
type: category
- name: hour_of_day
type: number
- name: country
type: category
- name: is_new_device
type: binary
output_features:
- name: is_fraud
type: binary
trainer:
epochs: 30
class_weights: balanced # handles class imbalance
─────────────────────────────────────────────
$ pip install ludwig
$ ludwig train --config fraud_detection.yaml --dataset transactions.csv
input_features:
- name: sales_history
type: timeseries
encoder:
type: transformer
preprocessing:
window_size: 90 # 90 days lookback
- name: store_id
type: category
- name: month
type: number
output_features:
- name: sales_forecast
type: timeseries
decoder:
horizon: 30 # predict next 30 days
trainer:
epochs: 50
learning_rate: 1.0e-3
─────────────────────────────────────────────
$ pip install ludwig
$ ludwig train --config forecasting.yaml --dataset sales.csv
model_type: llm
base_model: mistralai/Mistral-7B-v0.3
adapter:
type: lora
input_features:
- name: prompt
type: text
output_features:
- name: response
type: text
trainer:
type: finetune
epochs: 3
backend:
type: ray # ← the only change from local training
trainer:
use_gpu: true
num_workers: 4 # 4 × GPU workers
strategy: fsdp # or: ddp, deepspeed
resources_per_worker:
GPU: 1
─────────────────────────────────────────────
$ pip install "ludwig[llm,distributed]"
$ ludwig train --config distributed.yaml --dataset data.csv
import pandas as pd
from ludwig.api import LudwigModel
config = {
"input_features": [
{"name": "text", "type": "text",
"encoder": {"type": "bert"}}
],
"output_features": [
{"name": "label", "type": "category"}
],
"trainer": {"epochs": 5},
}
model = LudwigModel(config, logging_level="INFO")
df = pd.read_csv("dataset.csv")
# train, evaluate, predict
results = model.train(dataset=df)
preds, _ = model.predict(dataset=df)
# one-line AutoML baseline
from ludwig.automl import auto_train
automl = auto_train(
dataset=df, target="label", time_limit_s=3600
)
# export model
model.save("./my_model")
model.export_savedmodel("./my_model_export")
Every example ships with a config file, dataset, and expected results.
Fine-tune Llama or Mistral with LoRA on a custom dataset.
Reward-model-free RL and preference learning for LLMs.
Classify movie reviews with BERT — no PyTorch code needed.
MNIST, CIFAR, custom datasets with ResNet / ViT / DINOv2.
Pixel-level classification with UNet, SegFormer, FPN.
Detect anomalies in structured transaction data.
Multi-step forecasting with native timeseries output features.
Automated HPO with Optuna or Ray Tune, built in.
Token-level tagging with sequence output features.
Transcribe and classify audio with Wave2Vec encoders.
Answer natural-language questions about images in one model.
Multi-task, anomaly detection, seq2seq, machine translation, and more.
auto_train() without knowing the internals. Experts can plug in custom PyTorch encoders, control activation functions, and tune every hyperparameter. The abstraction is a choice, not a ceiling.torchao means you can fine-tune a 7B model on a single consumer GPU. You can also merge multiple trained adapters using TIES, DARE, or SVD.backend: ray config key enables multi-GPU DDP, FSDP, and DeepSpeed training without any code changes. For datasets too large to fit in memory, Ludwig supports lazy Dask-based preprocessing and efficient Ray Data pipelines.torch.export, and includes a FastAPI REST server via ludwig serve. For LLM serving, Ludwig integrates with vLLM for high-throughput inference. You can also upload to HuggingFace Hub with auto-generated model cards.ludwig:// scheme (e.g. ludwig://alpaca, ludwig://mnist), and can pull datasets from HuggingFace Datasets and Kaggle.