output_features list has the same structure of the
input_features list: it is a list of dictionaries containing a
name and a
They represent outputs / targets that you want your model to predict.
In most machine learning tasks you want to predict only one target variable, but in Ludwig you are allowed to specify as many outputs as you want and they are going to be optimized in a multi-task fashion, using a weighted sum of their losses as a combined loss to optimize.
Instead of having
encoders, output features have
decoders, but most of them have only one decoder so you don't have to specify it.
Decoders take the output of the combiner as input, process it further, for instance passing it through fully connected layers, and finally predict values and compute a loss and some measures (depending on the datatype different losses and measures apply).
Decoders have additional parameters, in particular
loss that allows you to specify a different loss to optimize for this specific decoder, for instance numerical features support both
mean_absolute_error as losses.
Details about the available decoders and losses alongside with the description of all parameters will be provided in the datatype-specific sections.
For the sake of simplicity you can imagine the input coming from the combiner to be a vector in most of the cases, but there is a
reduce_input parameter one can specify to change the default behavior.
As Ludwig allows for multiple output features to be specified and each output feature can be seen as a task the model is learning to perform, by consequence Ludwig supports Multi-task learning natively.
When multiple output features are specified, the loss that is optimized is a weighted sum of the losses of each individual output feature.
By default each loss weight is
1, but it can be changed by specifying a value for the
weight parameter in the
loss section of each output feature definition.
For example, given a
B, in order to optimize the loss
loss_total = 1.5 * loss_A + 0.8 + loss_B the
output_feature section of the configuration should look like:
Output Features Dependencies¶
An additional feature that Ludwig provides is the concept of dependency between
output_features. You can specify a list of output features as dependencies when you write the dictionary of a specific feature.
At model building time Ludwig checks that no cyclic dependency exists.
If you do so Ludwig will concatenate all the final representations before the prediction of those output features to the original input of the decoder.
The reason is that if different output features have a causal dependency, knowing which prediction has been made for one can help making the prediction of the other.
For instance if two output features are one coarse grained category and one fine-grained category that are in a hierarchical structure with each other, knowing the prediction made for coarse grained restricts the possible categories to predict for the fine-grained. In this case the following configuration structure can be used:
Assuming the input coming from the combiner has hidden dimension
h 128, there are two fully connected layers that return a vector with hidden size 64 at the end of the
coarse_class decoder (that vector will be used for the final layer before projecting in the output
coarse_class space). In the decoder of
fine_class, the 64 dimensional vector of
coarse_class will be concatenated to the combiner output vector, making a vector of hidden size 192 that will be passed through a fully connected layer and the 64 dimensional output will be used for the final layer before projecting in the output class space of the