Add a Loss Function

At a high level, a loss function evaluates how well a model predicts a dataset. Loss functions should always output a scalar. Lower loss corresponds to a better fit, thus the objective of training is to minimize the loss.

Ludwig losses conform to the torch.nn.Module interface, and are declared in ludwig/modules/loss_modules.py. Before implementing a new loss from scratch, check the documentation of torch.nn loss functions to see if the desired loss is available. Adding a torch loss to Ludwig is simpler than implementing a loss from scratch.

Add a torch loss to Ludwig¶

Torch losses whose call signature takes model outputs and targets i.e. loss(model(input), target) can be added to Ludwig easily by declaring a trivial subclass in ludwig/modules/loss_modules.py and registering the loss for one or more output feature types. This example adds MAELoss (mean absolute error loss) to Ludwig:

@register_loss("mean_absolute_error", [NUMBER, TIMESERIES, VECTOR])
class MAELoss(torch.nn.L1Loss, LogitsInputsMixin):
    def __init__(self, **kwargs):
        super().__init__()

The @register_loss decorator registers the loss under the name mean_absolute_error, and indicates it is supported for NUMBER, TIMESERIES, and VECTOR output features.

Implement a loss from scratch¶

Implement loss function¶

To implement a new loss function, we recommend first implementing it as a function of logits and labels, plus any other configuration parameters. For this example, lets suppose we have implemented the tempered softmax from "Robust Bi-Tempered Logistic Loss Based on Bregman Divergences". This loss function takes two constant parameters t1 and t2, which we'd like to allow users to specify in the config.

Assuming we have the following function:

def tempered_softmax_cross_entropy_loss(
        logits: torch.Tensor,
        labels: torch.Tensor,
        t1: float, t2: float) -> torch.Tensor:
    # Computes the loss, returns the result as a torch.Tensor.

Define and register module¶

Next, we'll define a module class which computes our loss function, and add it to the loss registry for CATEGORY output features with @register_loss. LogitsInputsMixin tells Ludwig that this loss should be called with the output feature logits, which are the feature decoder outputs before normalization to a probability distribution.

@register_loss("tempered_softmax_cross_entropy", [CATEGORY])
class TemperedSoftmaxCrossEntropy(torch.nn.Module, LogitsInputsMixin):

Note

It is possible to define losses on other outputs besides logits but this is not used in Ludwig today. For example, loss could be computed over probabilities, but it is usually more numerically stable to compute from logits (rather than backpropagating loss through a softmax function).

constructor¶

The loss constructor will receive any parameters specified in the config as kwargs. It must provide reasonable defaults for all arguments.

def __init__(self, t1: float = 1.0, t2: float = 1.0, **kwargs):
    super().__init__()
    self.t1 = t1
    self.t2 = t2

forward¶

The forward method is responsible for computing the loss. Here we'll call the tempered_softmax_cross_entropy_loss after ensuring its inputs are the correct type, and return its output averaged over the batch.

def forward(self, logits: torch.Tensor, target: torch.Tensor) -> torch.Tensor:
    labels = target.long()
    loss = tempered_softmax_cross_entropy_loss(logits, labels, self.t1, self.t2)
    return torch.mean(loss)