Skip to content

Advanced PEFT Adapters

Advanced PEFT Adapters

Ludwig's PEFT integration (backed by HuggingFace PEFT) goes well beyond standard LoRA. PR #4146 adds new LoRA initializers that improve convergence and final quality, per-module rank/alpha overrides, and several new adapter families covering orthogonal, wavelet-based, and layer-norm-only tuning strategies.

This page collects config snippets and short explanations for each advanced option. For the full parameter reference see the LLM configuration docs.

LoRA initializers

Standard LoRA initializes B = 0 so the adapter is a no-op at the start of training. The initializers below start from a better point, which speeds up convergence and often improves the final metric.

PiSSA

Principal Singular Values and Singular Vectors Adaptation aligns the low-rank subspace with the top-r singular components of each pretrained weight matrix. The residual is kept frozen. PiSSA consistently outperforms standard LoRA at the same rank and requires no extra data.

model_type: llm
base_model: meta-llama/Llama-3.1-8B

adapter:
  type: lora
  r: 16
  alpha: 16
  init_lora_weights: pissa

trainer:
  type: finetune
  epochs: 3
  learning_rate: 1e-4

CorDA

Correlation-Driven LoRA Adaptation initializes the subspace from activation correlations computed on a small calibration batch. It is most effective when a representative in-domain sample is available at initialization time.

adapter:
  type: lora
  r: 16
  alpha: 16
  init_lora_weights: corda

Layer-Norm tuning

Layer-Norm tuning trains only the weight and bias parameters of LayerNorm/RMSNorm layers. It is the lightest adapter type available — often fewer than 0.1% of backbone parameters — and works well for domain adaptation of already-instruction-tuned models where the knowledge is largely intact and only the output distribution needs to shift.

model_type: llm
base_model: mistralai/Mistral-7B-Instruct-v0.3

input_features:
  - name: prompt
    type: text
output_features:
  - name: response
    type: text

adapter:
  type: ln_tuning

trainer:
  type: finetune
  epochs: 2
  learning_rate: 5e-4
  batch_size: 4
  gradient_accumulation_steps: 8

Orthogonal adapters

OFT

Orthogonal Fine-Tuning constrains weight updates to orthogonal transformations, preserving the hyperspherical geometry of the pretrained representations. This keeps the relative angles between token embeddings stable during fine-tuning and is particularly effective for tasks that depend on semantic similarity structure.

adapter:
  type: oft
  r: 8
  module_dropout: 0.0

HRA

Householder Reflection Adaptation parameterizes updates as a product of Householder reflections. It achieves a similar orthogonality guarantee to OFT but with fewer parameters per layer.

adapter:
  type: hra
  r: 8

Both OFT and HRA are drop-in replacements for LoRA in any Ludwig LLM config — just change the type field on adapter.

Wavelet-based tuning

WaveFT

WaveFT applies updates in the wavelet domain, concentrating the parameter budget on the frequency bands most perturbed during fine-tuning. It is especially useful for models that process structured signals (audio, images encoded as tokens) where frequency structure carries semantic meaning.

adapter:
  type: waveft
  r: 8
  alpha: 16

Vector-bank adapters

VBLoRA

Vector-Bank LoRA replaces the per-layer B matrix with a shared global dictionary of vectors. Each layer selects a subset of these vectors and linearly combines them. When many layers learn similar update directions this can reduce total parameter count significantly versus standard LoRA.

adapter:
  type: vblora
  r: 4
  num_vectors: 256
  vector_length: 256

Comparison table

Adapter Extra params (approx.) Key strength Best for
lora (default init) ~0.1–1% Versatile, well-studied General fine-tuning baseline
lora + pissa ~0.1–1% Better initialization, faster convergence When standard LoRA underfits
lora + corda ~0.1–1% Data-driven subspace alignment In-domain adaptation with calibration data
lora + loftq ~0.1–1% Minimises quantization error at init 4-bit QLoRA fine-tuning
ln_tuning <0.1% Extremely lightweight Domain shift with instruction-tuned base models
oft ~0.5–2% Preserves hyperspherical geometry Semantic similarity, generation fidelity
hra ~0.3–1% Orthogonal, fewer params than OFT Same as OFT with tighter parameter budget
waveft ~0.5–2% Frequency-domain concentration Audio/vision token models
vblora ~0.05–0.5% Shared vector bank across layers Very low parameter budgets
c3a ~0.2–1% Block-sparse updates Sparse activation models
tinylora ~0.05–0.5% Learned rank allocation Strict parameter count constraints