Add a Loss Function
At a high level, a loss function evaluates how well a model predicts a dataset. Loss functions should always output a scalar. Lower loss corresponds to a better fit, thus the objective of training is to minimize the loss.
Ludwig losses conform to the
torch.nn.Module interface, and are declared in
implementing a new loss from scratch, check the documentation of torch.nn loss functions
to see if the desired loss is available. Adding a torch loss to Ludwig is simpler than implementing a loss from scratch.
Add a torch loss to Ludwig¶
Torch losses whose call signature takes model outputs and targets i.e.
loss(model(input), target) can be added to
Ludwig easily by declaring a trivial subclass in
ludwig/modules/loss_modules.py and registering the loss for one or
more output feature types. This example adds
MAELoss (mean absolute error loss) to Ludwig:
@register_loss("mean_absolute_error", [NUMBER, TIMESERIES, VECTOR]) class MAELoss(torch.nn.L1Loss, LogitsInputsMixin): def __init__(self, **kwargs): super().__init__()
@register_loss decorator registers the loss under the name
mean_absolute_error, and indicates it is supported
VECTOR output features.
Implement a loss from scratch¶
Implement loss function¶
To implement a new loss function, we recommend first implementing it as a function of logits and labels, plus any other
configuration parameters. For this example, lets suppose we have implemented the tempered softmax from
"Robust Bi-Tempered Logistic Loss Based on Bregman Divergences". This loss function
takes two constant parameters
t2, which we'd like to allow users to specify in the config.
Assuming we have the following function:
def tempered_softmax_cross_entropy_loss( logits: torch.Tensor, labels: torch.Tensor, t1: float, t2: float) -> torch.Tensor: # Computes the loss, returns the result as a torch.Tensor.
Define and register module¶
Next, we'll define a module class which computes our loss function, and add it to the loss registry for
output features with
LogitsInputsMixin tells Ludwig that this loss should be called with the output
logits, which are the feature decoder outputs before normalization to a probability distribution.
@register_loss("tempered_softmax_cross_entropy", [CATEGORY]) class TemperedSoftmaxCrossEntropy(torch.nn.Module, LogitsInputsMixin):
It is possible to define losses on other outputs besides
logits but this is not used in Ludwig today. For
example, loss could be computed over
probabilities, but it is usually more numerically stable to compute from
logits (rather than backpropagating loss through a softmax function).
The loss constructor will receive any parameters specified in the config as kwargs. It must provide reasonable defaults for all arguments.
def __init__(self, t1: float = 1.0, t2: float = 1.0, **kwargs): super().__init__() self.t1 = t1 self.t2 = t2
The forward method is responsible for computing the loss. Here we'll call the
after ensuring its inputs are the correct type, and return its output averaged over the batch.
def forward(self, logits: torch.Tensor, target: torch.Tensor) -> torch.Tensor: labels = target.long() loss = tempered_softmax_cross_entropy_loss(logits, labels, self.t1, self.t2) return torch.mean(loss)