⇅ Vector Features

Vector features enable providing an ordered set of numerical values within a single feature.

This is useful for providing pre-trained representations or activations obtained from other models or for providing multivariate inputs and outputs. An interesting use of vector features is the possibility of providing a probability distribution as output for a multiclass classification problem instead of a single correct class like with a category feature. Vector output features can also be useful for distillation and noise-aware losses.

Preprocessing¶

The data is expected as whitespace separated numerical values. Example: "1.0 0.0 1.04 10.49". All vectors are expected to be of the same size.

preprocessing:
    vector_size: null
    missing_value_strategy: fill_with_const
    fill_value: ''

Parameters:

vector_size (default: null) : The size of the vector. If None, the vector size will be inferred from the data.
missing_value_strategy (default: fill_with_const) : What strategy to follow when there's a missing value in a vector column Options: fill_with_const, fill_with_mode, bfill, ffill, drop_row. See Missing Value Strategy for details.
fill_value (default: ``): The value to replace missing values with in case the missing_value_strategy is fill_with_const

Preprocessing parameters can also be defined once and applied to all vector input features using the Type-Global Preprocessing section.

Input Features¶

The vector feature supports two encoders: dense and passthrough.

The encoder parameters specified at the feature level are:

tied (default null): name of the input feature to tie the weights of the encoder with. It needs to be the name of a feature of the same type and with the same encoder parameters.

Example vector feature entry in the input features list:

name: vector_column_name
type: vector
tied: null
encoder: 
    type: dense

The available encoder parameters are:

type (default dense): the possible values are passthrough and dense. passthrough outputs the raw vector values unaltered. dense uses a stack of fully connected layers to create an embedding matrix.

Encoder type and encoder parameters can also be defined once and applied to all vector input features using the Type-Global Encoder section.

Encoders¶

Passthrough Encoder¶

encoder:
    type: passthrough

There are no additional parameters for passthrough encoder.

Dense Encoder¶

For vector features, a dense encoder (stack of fully connected layers) can be used to encode the vector.

encoder:
    type: dense
    dropout: 0.0
    output_size: 256
    norm: null
    num_layers: 1
    activation: relu
    use_bias: true
    bias_initializer: zeros
    weights_initializer: xavier_uniform
    norm_params: null
    fc_layers: null

Parameters:

dropout (default: 0.0) : Default dropout rate applied to fully connected layers. Increasing dropout is a common form of regularization to combat overfitting. The dropout is expressed as the probability of an element to be zeroed out (0.0 means no dropout).
output_size (default: 256) : Size of the output of the feature.
norm (default: null) : Default normalization applied at the beginnging of fully connected layers. Options: batch, layer, ghost, null. See Normalization for details.
num_layers (default: 1) : Number of stacked fully connected layers to apply. Increasing layers adds capacity to the model, enabling it to learn more complex feature interactions.
activation (default: relu): Default activation function applied to the output of the fully connected layers. Options: elu, leakyRelu, logSigmoid, relu, sigmoid, tanh, softmax, null.
use_bias (default: true): Whether the layer uses a bias vector. Options: true, false.
bias_initializer (default: zeros): Initializer for the bias vector. Options: uniform, normal, constant, ones, zeros, eye, dirac, xavier_uniform, xavier_normal, kaiming_uniform, kaiming_normal, orthogonal, sparse, identity. Alternatively it is possible to specify a dictionary with a key type that identifies the type of initializer and other keys for its parameters, e.g. {type: normal, mean: 0, stddev: 0}. For a description of the parameters of each initializer, see torch.nn.init.
weights_initializer (default: xavier_uniform): Initializer for the weight matrix. Options: uniform, normal, constant, ones, zeros, eye, dirac, xavier_uniform, xavier_normal, kaiming_uniform, kaiming_normal, orthogonal, sparse, identity. Alternatively it is possible to specify a dictionary with a key type that identifies the type of initializer and other keys for its parameters, e.g. {type: normal, mean: 0, stddev: 0}. For a description of the parameters of each initializer, see torch.nn.init.
norm_params (default: null): Default parameters passed to the norm module.
fc_layers (default: null): List of dictionaries containing the parameters of all the fully connected layers. The length of the list determines the number of stacked fully connected layers and the content of each dictionary determines the parameters for a specific layer. The available parameters for each layer are: activation, dropout, norm, norm_params, output_size, use_bias, bias_initializer and weights_initializer. If any of those values is missing from the dictionary, the default one provided as a standalone parameter will be used instead.

Output Features¶

graph LR
  A["Combiner Output"] --> B["Fully\n Connected\n Layers"];
  B --> C["Projection into\nVector Size"] --> D["Softmax"];
  subgraph DEC["DECODER.."]
  B
  C
  D
  end

Vector features can be used when multi-class classification needs to be performed with a noise-aware loss or when the task is multivariate regression.

There is only one decoder available for vector features: a (potentially empty) stack of fully connected layers, followed by a projection into a tensor of the vector size (optionally followed by a softmax in the case of multi-class classification).

Example vector output feature using default parameters:

name: vector_column_name
type: vector
reduce_input: sum
dependencies: []
reduce_dependencies: sum
loss:
    type: sigmoid_cross_entropy
decoder:
    type: projector

Parameters:

reduce_input (default sum): defines how to reduce an input that is not a vector, but a matrix or a higher order tensor, on the first dimension (second if you count the batch dimension). Available values are: sum, mean or avg, max, concat (concatenates along the first dimension), last (returns the last vector of the first dimension).
dependencies (default []): the output features this one is dependent on. For a detailed explanation refer to Output Features Dependencies.
reduce_dependencies (default sum): defines how to reduce the output of a dependent feature that is not a vector, but a matrix or a higher order tensor, on the first dimension (second if you count the batch dimension). Available values are: sum, mean or avg, max, concat (concatenates along the first dimension), last (returns the last vector of the first dimension).
softmax (default false): determines if to apply a softmax at the end of the decoder. It is useful for predicting a vector of values that sum up to 1 and can be interpreted as probabilities.
loss (default {type: mean_squared_error}): is a dictionary containing a loss type. The available loss type are mean_squared_error, mean_absolute_error and softmax_cross_entropy (use it only if softmax is true). See Loss for details.
decoder (default: {"type": "projector"}): Decoder for the desired task. Options: projector. See Decoder for details.

Decoders¶

Projector¶

decoder:
    type: projector
    num_fc_layers: 0
    fc_output_size: 256
    fc_norm: null
    fc_dropout: 0.0
    output_size: null
    fc_activation: relu
    activation: null
    fc_layers: null
    fc_use_bias: true
    fc_weights_initializer: xavier_uniform
    fc_bias_initializer: zeros
    fc_norm_params: null
    use_bias: true
    weights_initializer: xavier_uniform
    bias_initializer: zeros
    clip: null
    multiplier: 1.0

Parameters:

num_fc_layers (default: 0) : Number of fully-connected layers if fc_layers not specified. Increasing layers adds capacity to the model, enabling it to learn more complex feature interactions.
fc_output_size (default: 256) : Output size of fully connected stack.
fc_norm (default: null) : Default normalization applied at the beginnging of fully connected layers. Options: batch, layer, ghost, null. See Normalization for details.
fc_dropout (default: 0.0) : Default dropout rate applied to fully connected layers. Increasing dropout is a common form of regularization to combat overfitting. The dropout is expressed as the probability of an element to be zeroed out (0.0 means no dropout).
output_size (default: null) : Size of the output of the decoder.
fc_activation (default: relu): Default activation function applied to the output of the fully connected layers. Options: elu, leakyRelu, logSigmoid, relu, sigmoid, tanh, softmax, null.
activation (default: null): Indicates the activation function applied to the output. Options: elu, leakyRelu, logSigmoid, relu, sigmoid, tanh, softmax, null.
fc_layers (default: null): List of dictionaries containing the parameters of all the fully connected layers. The length of the list determines the number of stacked fully connected layers and the content of each dictionary determines the parameters for a specific layer. The available parameters for each layer are: activation, dropout, norm, norm_params, output_size, use_bias, bias_initializer and weights_initializer. If any of those values is missing from the dictionary, the default one provided as a standalone parameter will be used instead.
fc_use_bias (default: true): Whether the layer uses a bias vector in the fc_stack. Options: true, false.
fc_weights_initializer (default: xavier_uniform): The weights initializer to use for the layers in the fc_stack
fc_bias_initializer (default: zeros): The bias initializer to use for the layers in the fc_stack
fc_norm_params (default: null): Default parameters passed to the norm module.
use_bias (default: true): Whether the layer uses a bias vector. Options: true, false.
weights_initializer (default: xavier_uniform): Initializer for the weight matrix. Options: uniform, normal, constant, ones, zeros, eye, dirac, xavier_uniform, xavier_normal, kaiming_uniform, kaiming_normal, orthogonal, sparse, identity.
bias_initializer (default: zeros): Initializer for the bias vector. Options: uniform, normal, constant, ones, zeros, eye, dirac, xavier_uniform, xavier_normal, kaiming_uniform, kaiming_normal, orthogonal, sparse, identity.
clip (default: null): Clip the output of the decoder to be within the given range.
multiplier (default: 1.0): Multiplier to scale the activated outputs by. Useful when setting activation to something that outputs a value between [-1, 1] like tanh to re-scale values back to order of magnitude of the data you're trying to predict. A good rule of thumb in such cases is to pick a value like x * (max - min) where x is a scalar in the range [1, 2]. For example, if you're trying to predict something like temperature, it might make sense to pick a multiplier on the order of 100.

Decoder type and decoder parameters can also be defined once and applied to all vector output features using the Type-Global Decoder section.

Loss¶

Mean Squared Error (MSE)¶

loss:
    type: mean_squared_error
    weight: 1.0

Parameters:

weight (default: 1.0): Weight of the loss.

Mean Absolute Error (MAE)¶

loss:
    type: mean_absolute_error
    weight: 1.0

Parameters:

weight (default: 1.0): Weight of the loss.

Mean Absolute Percentage Error (MAPE)¶

loss:
    type: mean_absolute_percentage_error
    weight: 1.0

Parameters:

weight (default: 1.0): Weight of the loss.

Softmax Cross Entropy¶

loss:
    type: softmax_cross_entropy
    class_weights: null
    weight: 1.0
    robust_lambda: 0
    confidence_penalty: 0
    class_similarities: null
    class_similarities_temperature: 0

Parameters:

class_weights (default: null) : Weights to apply to each class in the loss. If not specified, all classes are weighted equally. The value can be a vector of weights, one for each class, that is multiplied to the loss of the datapoints that have that class as ground truth. It is an alternative to oversampling in case of unbalanced class distribution. The ordering of the vector follows the category to integer ID mapping in the JSON metadata file (the <UNK> class needs to be included too). Alternatively, the value can be a dictionary with class strings as keys and weights as values, like {class_a: 0.5, class_b: 0.7, ...}.
weight (default: 1.0): Weight of the loss.
robust_lambda (default: 0): Replaces the loss with (1 - robust_lambda) * loss + robust_lambda / c where c is the number of classes. Useful in case of noisy labels.
confidence_penalty (default: 0): Penalizes overconfident predictions (low entropy) by adding an additional term that penalizes too confident predictions by adding a a * (max_entropy - entropy) / max_entropy term to the loss, where a is the value of this parameter. Useful in case of noisy labels.
class_similarities (default: null): If not null it is a c x c matrix in the form of a list of lists that contains the mutual similarity of classes. It is used if class_similarities_temperature is greater than 0. The ordering of the vector follows the category to integer ID mapping in the JSON metadata file (the <UNK> class needs to be included too).
class_similarities_temperature (default: 0): The temperature parameter of the softmax that is performed on each row of class_similarities. The output of that softmax is used to determine the supervision vector to provide instead of the one hot vector that would be provided otherwise for each datapoint. The intuition behind it is that errors between similar classes are more tolerable than errors between really different classes.

Huber Loss¶

loss:
    type: huber
    weight: 1.0
    delta: 1.0

Parameters:

weight (default: 1.0): Weight of the loss.
delta (default: 1.0): Threshold at which to change between delta-scaled L1 and L2 loss.

Loss type and loss related parameters can also be defined once and applied to all vector output features using the Type-Global Loss section.

Metrics¶

The metrics that are calculated every epoch and are available for set features are mean_squared_error, mean_absolute_error, r2, and the loss itself.

You can set any of them as validation_metric in the training section of the configuration if you set the validation_field to be the name of a vector feature.