Multi-Adapter PEFT
Multi-Adapter PEFT¶
Ludwig supports training and deploying multiple named PEFT adapters on the same base model. This enables domain specialization, adapter ensembles, and efficient A/B testing — all without duplicating the large base model weights.
When to use multi-adapter PEFT¶
| Use case | Description |
|---|---|
| Domain specialization | Train a "coding" adapter and a "chat" adapter separately, then merge for a model that does both |
| A/B testing | Switch between adapters at inference time without reloading the base model |
| Continual learning | Add a new adapter for each new task without forgetting previous ones |
| Adapter ensembles | Merge adapters trained on different data splits for better generalization |
Configuration¶
Use adapters: (plural) instead of the singular adapter: to define multiple named adapters:
model_type: llm
base_model: meta-llama/Llama-3.1-8B
adapters:
adapters:
coding:
type: lora
r: 16
alpha: 32
target_modules: ["q_proj", "v_proj"]
chat:
type: lora
r: 8
alpha: 16
target_modules: ["q_proj", "v_proj"]
active: coding # which adapter is used at inference time
adapter: and adapters: are mutually exclusive — existing single-adapter configs work
unchanged.
Merging adapters¶
After training, merge multiple adapters into one using PEFT's add_weighted_adapter():
adapters:
adapters:
coding:
type: lora
r: 16
chat:
type: lora
r: 8
merge:
name: coding_chat # name of the merged adapter
sources: [coding, chat]
weights: [0.7, 0.3] # relative weights; will be normalised
combination_type: ties
density: 0.7 # fraction of deltas to retain (for TIES/DARE)
active: coding_chat
Combination strategies¶
| Strategy | Paper | Description |
|---|---|---|
linear |
— | Weighted sum of LoRA weight deltas. Fast, no pruning. |
svd |
— | SVD-based merge; reduces rank after combining |
ties |
Yadav et al., NeurIPS 2023 | Resolves sign conflicts between adapters before summing deltas |
dare_linear |
Yu et al., ICML 2024 | Prunes fraction (1-density) of deltas, then linear merge |
dare_ties |
Yu et al., ICML 2024 | Prunes deltas, then TIES sign-conflict resolution |
magnitude_prune |
— | Keeps top-magnitude deltas before merging |
Picking a strategy:
- Start with
linear— it's fastest and often competitive - Use
tieswhen adapters were trained on conflicting tasks (coding vs chat, different languages) - Use
dare_linearordare_tieswhen you want sparse, memory-efficient merged adapters density: 0.7means keep 70% of deltas; lower values produce sparser adapters
Training multiple adapters¶
Train adapters separately and merge at the end:
from ludwig.api import LudwigModel
import pandas as pd
# Train coding adapter
config_coding = {
"model_type": "llm",
"base_model": "meta-llama/Llama-3.1-8B",
"adapters": {"adapters": {"coding": {"type": "lora", "r": 16}}},
"input_features": [{"name": "prompt", "type": "text"}],
"output_features": [{"name": "response", "type": "text"}],
"trainer": {"type": "finetune", "epochs": 3},
}
model = LudwigModel(config=config_coding)
model.train(dataset="coding_dataset.csv", output_directory="./coding_model")
# Train chat adapter
config_chat = {**config_coding}
config_chat["adapters"]["adapters"] = {"chat": {"type": "lora", "r": 8}}
model2 = LudwigModel(config=config_chat)
model2.train(dataset="chat_dataset.csv", output_directory="./chat_model")
Switching adapters at inference time¶
With multiple adapters loaded, switch between them without reloading the model:
from ludwig.api import LudwigModel
model = LudwigModel.load("results/multi_adapter_model")
# Access the underlying PEFT model
peft_model = model.model.model
# Switch active adapter
peft_model.set_adapter("coding")
coding_preds, _ = model.predict(dataset=coding_test_df)
peft_model.set_adapter("chat")
chat_preds, _ = model.predict(dataset=chat_test_df)
Memory considerations¶
Multi-adapter PEFT loads all adapter weights simultaneously but shares the base model:
| Setup | Memory overhead vs single adapter |
|---|---|
| 2 × LoRA-r8 adapters | ~2× adapter memory (small vs base model) |
| Merged adapter | No overhead — same as single adapter |
| Enabled adapter switching | All adapter weights in VRAM |
For large numbers of adapters, consider training sequentially and using merge: to combine
them — the merged adapter has the same memory footprint as a single adapter.
See also¶
- LLM Fine-Tuning — single-adapter PEFT with LoRA/QLoRA
- LLM configuration reference — full YAML schema
- Multi-adapter example — runnable walkthrough