Fine-tuning for classification
Full fine-tuning¶
Ludwig supports full fine-tuning with any LLM on Huggingface.
input_features:
- name: title
type: text
encoder:
type: auto_transformer
pretrained_model_name_or_path: bigscience/bloom-3b
trainable: true
output_features:
- name: class
type: category
trainer:
learning_rate: 1.0e-05
epochs: 3
backend:
type: ray
trainer:
strategy: fsdp
See a demonstration using Ludwig Python API:
Decoder-only fine-tuning with cached encoder embeddings¶
Ludwig currently supports two variations for fine-tuning encoders, configured via the trainable
encoder parameter:
- Modifying the weights of the pretrained encoder to adapt them to the downstream task (
trainable=true
). - Keeping the pretrained encoder weights fixed and training a stack of dense layers that sit downstream as the combiner and decoder modules (
trainable=false
). This is sometimes distinguished as transfer learning.
Training can be over 50x faster when trainable=false
with the following additional configuration adjustments:
- Automatic mixed precision (AMP) training, which is available when both
trainable=true
andtrainable=false
. - Cached encoder embeddings, which is only available when
trainable=false
. - Approximate training set evaluation (
evaluate_training_set=false
), which computes the reported training set metrics at the end of each epoch as a running aggregation of the metrics during training, rather than a separate pass over the training set at the end of each epoch of training. Though this makes training metrics appear βnoisyβ in the early epochs of training, it generally results in a 33% speedup in training time.
input_features:
- name: review
type: text
encoder:
type: auto_transformer
pretrained_model_name_or_path: bert-base-uncased
trainable: false
preprocessing:
cache_encoder_embeddings: true
output_features:
- name: sentiment
type: category
Adapter-based fine-tuning¶
One of the biggest barriers to cost effective fine-tuning for LLMs is the need to update billions of parameters each training step. Parameter efficient fine-tuning (PEFT) describes a collection of techniques that reduce the number of trainable parameters during fine-tuning to speed up training, and decrease the memory and disk space required to train large language models.
PEFT is a popular library from HuggingFace that implements a number of popular parameter efficient fine-tuning strategies, and now in Ludwig v0.8, we provide native integration with PEFT, allowing you to leverage any number of techniques to more efficiently fine-tune LLMs with a single parameter change in the configuration.
One of the most commonly used PEFT adapters is low-rank adaptation (LoRA), which can now be enabled for any large language model in Ludwig with the βadapterβ parameter:
adapter: lora
Additionally, any of the LoRA hyperparameters can be configured explicitly to override Ludwigβs defaults:
adapter:
type: lora
r: 16
alpha: 32
dropout: 0.1
Adapters can be added to any LLM model type, or any pretrained auto_transformer text encoder in Ludwig with the same parameter options.
In Ludwig v0.8, weβve added native support for the following PEFT techniques with more to come. Read more about PEFT in Ludwig here.
- LoRA
- AdaLoRA
- Adaptation Prompt (aka, LLaMA Adapter)
In most frameworks, it would be a lot of work to take an LLM that generates text and adapt it to do classification or regression, but in Ludwig itβs as simple as changing a few lines in the YAML config:
input_features:
- name: review
type: text
encoder:
type: auto_transformer
pretrained_model_name_or_path: meta-llama/Llama-2-7b-hf
trainable: true
adapter: lora
output_features:
- name: sentiment
type: category