Add a Metric

Metrics are used to report model performance during training and evaluation, and also serve as optimization objectives for hyperparameter optimization.

Concretely, metrics are modules which compute a function of the model's output for each batch and aggregate the function's result over all batches. A common example of a metric is the LossMetric, which computes the average batch loss. Metrics are defined in ludwig/modules/metric_modules.py. Ludwig's metrics are designed to be consistent with torchmetrics and conform to the interface of torchmetrics.Metric.

Note

Before implementing a new metric from scratch, check the torchmetrics documentation to see if the desired function is available there. Torch metrics can often be added to Ludwig trivially, see RMSEMetric in ludwig/modules/metric_modules.py for example.

1. Add a new metric class¶

For the majority of use cases metrics should be averaged over batches, for this Ludwig provides a MeanMetric class which keeps a running average of its values. The following examples will assume averaging is desired and inherit from MeanMetric. If you need different aggregation behavior replace MeanMetric with LudwigMetric and accumulate the metric values as needed.

We'll use TokenAccuracyMetric as an example, which treats each token of a sequence as an independent prediction and computes average accuracy over sequences.

First, declare the new metric class in ludwig/modules/metric_modules.py:

class TokenAccuracyMetric(MeanMetric):

2. Implement required methods¶

get_current_value¶

If using MeanMetric, compute the value of the metric given a batch of feature outputs and target values in get_current_value.

def get_current_value(
        self, preds: torch.Tensor, target: torch.Tensor) -> torch.Tensor:
    # Compute metric over a batch of predictions (preds) and truth values (target).
    # Aggregate metric over batch.
    return metric_value

Inputs

preds (torch.Tensor): A batch of outputs from an output feature which are either predictions, probabilities, or logits depending on the return value of get_inputs.
target (torch.Tensor): The batch of true labels for the dataset column corresponding to the metric's output feature.

Return

(torch.Tensor): The computed metric, in most cases this will be a scalar value.

update and reset¶

If not using MeanMetric, implement update and reset instead of get_current_value.

def update(self, preds: torch.Tensor, target: torch.Tensor) -> None:
    # Compute metric over a batch of predictions (preds) and truth values (target).
    # Accumulate metric values or aggregate statistics.

Inputs

preds (torch.Tensor): A batch of outputs from an output feature which are either predictions, probabilities, or logits depending on the return value of get_inputs.
target (torch.Tensor): The batch of true labels for the dataset column corresponding to the metric's output feature.

def reset(self) -> None:
    # Reset accumulated values.

Note

MeanMetric's update method simply delegates metric computation to get_current_value.

def update(self, preds: torch.Tensor, target: torch.Tensor) -> None:
    self.avg.update(self.get_current_value(preds, target))

get_objective¶

The return value of get_objective tells Ludwig whether to minimize or maximize this metric in hyperparameter optimization.

@classmethod
def get_objective(cls):
    return MAXIMIZE

Return

(str): How this metric should be optimized, one of MINIMIZE or MAXIMIZE.

get_inputs¶

Determines which feature output is passed in to this metric's update or get_current_value method. Valid return values are:

PREDICTIONS: The predicted values of the output feature.
PROBABILITIES: The vector of probabilities.
LOGITS: The vector of outputs of the feature decoder's final layer (before the application of any sigmoid or softmax function).

@classmethod
def get_inputs(cls):
    return PREDICTIONS

Return

(str): Which output this metric derives its value from, one of PREDICTIONS, PROBABILITIES, or LOGITS.

3. Add the new metric class to the registry¶

Mapping between metric names in the config and metric classes is made by registering the class in a metric registry. The metric registry is defined in ludwig/modules/metric_registry.py. To register your class, add the @register_metric decorator on the line above its class definition, specifying the name of the metric and a list of the supported output feature types:

@register_metric(TOKEN_ACCURACY, [SEQUENCE, TEXT])
class TokenAccuracyMetric(MeanMetric):