Skip to content

⇅ Number Features

Preprocessing

Number features are directly transformed into a float valued vector of length n (where n is the size of the dataset) and added to the HDF5 with a key that reflects the name of column in the dataset. No additional information about them is available in the JSON metadata file.

preprocessing:
    missing_value_strategy: fill_with_const
    normalization: zscore
    outlier_strategy: null
    fill_value: 0.0
    outlier_threshold: 3.0

Parameters:

  • missing_value_strategy (default: fill_with_const) : What strategy to follow when there's a missing value in a number column Options: fill_with_const, fill_with_mode, bfill, ffill, drop_row, fill_with_mean. See Missing Value Strategy for details.
  • normalization (default: zscore) : Normalization strategy to use for this number feature. If the value is null no normalization is performed. Options: zscore, minmax, log1p, iq, null. See Normalization for details.
  • outlier_strategy (default: null) : Determines how outliers will be handled in the dataset. In most cases, replacing outliers with the column mean (fill_with_mean) will be sufficient, but in others the outliers may be damaging enough to merit dropping the entire row of data (drop_row). In some cases, the best way to handle outliers is to leave them in the data, which is the behavior when this parameter is left as null. Options: fill_with_const, fill_with_mode, bfill, ffill, drop_row, fill_with_mean, null.
  • fill_value (default: 0.0): The value to replace missing values with in case the missing_value_strategy is fill_with_const
  • outlier_threshold (default: 3.0): Standard deviations from the mean past which a value is considered an outlier. The 3-sigma rule in statistics tells us that when data is normally distributed, 95% of the data will lie within 2 standard deviations of the mean, and greater than 99% of the data will lie within 3 standard deviations of the mean (see: 68–95–99.7 rule). As such anything farther away than that is highly likely to be an outlier, and may distort the learning process by disproportionately affecting the model.

Preprocessing parameters can also be defined once and applied to all number input features using the Type-Global Preprocessing section.

Normalization

Technique to be used when normalizing the number feature types.

Options:

  • null: No normalization is performed.
  • zscore: The mean and standard deviation are computed so that values are shifted to have zero mean and 1 standard deviation.
  • minmax: The minimum is subtracted from values and the result is divided by difference between maximum and minimum.
  • log1p: The value returned is the natural log of 1 plus the original value. Note: log1p is defined only for positive values.
  • iq: The median is subtracted from values and the result is divided by the interquartile range (IQR), i.e., the 75th percentile value minus the 25th percentile value. The resulting data has a zero mean and median and a standard deviation of 1. This is useful if your feature has large outliers since the normalization won't be skewed by those values.

The best normalization techniqe to use depends on the distribution of your data, but zscore is a good place to start in many cases.

Input Features

Number features have two encoders. One encoder (passthrough) simply returns the raw numerical values coming from the input placeholders as outputs. Inputs are of size b while outputs are of size b x 1 where b is the batch size. The other encoder (dense) passes the raw numerical values through fully connected layers. In this case the inputs of size b are transformed to size b x h.

The encoder parameters specified at the feature level are:

  • tied (default null): name of the input feature to tie the weights of the encoder with. It needs to be the name of a feature of the same type and with the same encoder parameters.

Example number feature entry in the input features list:

name: number_column_name
type: number
tied: null
encoder: 
    type: dense

The available encoder parameters:

  • type (default passthrough): the possible values are passthrough, dense and sparse. passthrough outputs the raw integer values unaltered. dense randomly initializes a trainable embedding matrix, sparse uses one-hot encoding.

Encoder type and encoder parameters can also be defined once and applied to all number input features using the Type-Global Encoder section.

Encoders

Passthrough Encoder

encoder:
    type: passthrough

There are no additional parameters for passthrough encoder.

Dense Encoder

encoder:
    type: dense
    dropout: 0.0
    output_size: 256
    norm: null
    num_layers: 1
    activation: relu
    use_bias: true
    bias_initializer: zeros
    weights_initializer: xavier_uniform
    norm_params: null
    fc_layers: null

Parameters:

  • dropout (default: 0.0) : Default dropout rate applied to fully connected layers. Increasing dropout is a common form of regularization to combat overfitting. The dropout is expressed as the probability of an element to be zeroed out (0.0 means no dropout).
  • output_size (default: 256) : Size of the output of the feature.
  • norm (default: null) : Default normalization applied at the beginnging of fully connected layers. Options: batch, layer, ghost, null. See Normalization for details.
  • num_layers (default: 1) : Number of stacked fully connected layers to apply. Increasing layers adds capacity to the model, enabling it to learn more complex feature interactions.
  • activation (default: relu): Default activation function applied to the output of the fully connected layers. Options: elu, leakyRelu, logSigmoid, relu, sigmoid, tanh, softmax, null.
  • use_bias (default: true): Whether the layer uses a bias vector. Options: true, false.
  • bias_initializer (default: zeros): Initializer for the bias vector. Options: uniform, normal, constant, ones, zeros, eye, dirac, xavier_uniform, xavier_normal, kaiming_uniform, kaiming_normal, orthogonal, sparse, identity. Alternatively it is possible to specify a dictionary with a key type that identifies the type of initializer and other keys for its parameters, e.g. {type: normal, mean: 0, stddev: 0}. For a description of the parameters of each initializer, see torch.nn.init.

  • weights_initializer (default: xavier_uniform): Initializer for the weight matrix. Options: uniform, normal, constant, ones, zeros, eye, dirac, xavier_uniform, xavier_normal, kaiming_uniform, kaiming_normal, orthogonal, sparse, identity. Alternatively it is possible to specify a dictionary with a key type that identifies the type of initializer and other keys for its parameters, e.g. {type: normal, mean: 0, stddev: 0}. For a description of the parameters of each initializer, see torch.nn.init.

  • norm_params (default: null): Default parameters passed to the norm module.

  • fc_layers (default: null): List of dictionaries containing the parameters of all the fully connected layers. The length of the list determines the number of stacked fully connected layers and the content of each dictionary determines the parameters for a specific layer. The available parameters for each layer are: activation, dropout, norm, norm_params, output_size, use_bias, bias_initializer and weights_initializer. If any of those values is missing from the dictionary, the default one provided as a standalone parameter will be used instead.

Output Features

Number features can be used when a regression needs to be performed. There is only one decoder available for number features: a (potentially empty) stack of fully connected layers, followed by a projection to a single number.

Example number output feature using default parameters:

name: number_column_name
type: number
reduce_input: sum
dependencies: []
reduce_dependencies: sum
loss:
    type: mean_squared_error
decoder:
    type: regressor

Parameters:

  • reduce_input (default sum): defines how to reduce an input that is not a vector, but a matrix or a higher order tensor, on the first dimension (second if you count the batch dimension). Available values are: sum, mean or avg, max, concat (concatenates along the first dimension), last (returns the last vector of the first dimension).
  • dependencies (default []): the output features this one is dependent on. For a detailed explanation refer to Output Feature Dependencies.
  • reduce_dependencies (default sum): defines how to reduce the output of a dependent feature that is not a vector, but a matrix or a higher order tensor, on the first dimension (second if you count the batch dimension). Available values are: sum, mean or avg, max, concat (concatenates along the first dimension), last (returns the last vector of the first dimension).
  • loss (default {type: mean_squared_error}): is a dictionary containing a loss type. Options: mean_squared_error, mean_absolute_error, root_mean_squared_error, root_mean_squared_percentage_error. See Loss for details.
  • decoder (default: {"type": "regressor"}): Decoder for the desired task. Options: regressor. See Decoder for details.

Decoders

Regressor

decoder:
    type: regressor
    num_fc_layers: 0
    fc_output_size: 256
    fc_norm: null
    fc_dropout: 0.0
    fc_activation: relu
    fc_layers: null
    fc_use_bias: true
    fc_weights_initializer: xavier_uniform
    fc_bias_initializer: zeros
    fc_norm_params: null
    use_bias: true
    weights_initializer: xavier_uniform
    bias_initializer: zeros

Parameters:

  • num_fc_layers (default: 0) : Number of fully-connected layers if fc_layers not specified. Increasing layers adds capacity to the model, enabling it to learn more complex feature interactions.
  • fc_output_size (default: 256) : Output size of fully connected stack.
  • fc_norm (default: null) : Default normalization applied at the beginnging of fully connected layers. Options: batch, layer, ghost, null. See Normalization for details.
  • fc_dropout (default: 0.0) : Default dropout rate applied to fully connected layers. Increasing dropout is a common form of regularization to combat overfitting. The dropout is expressed as the probability of an element to be zeroed out (0.0 means no dropout).
  • fc_activation (default: relu): Default activation function applied to the output of the fully connected layers. Options: elu, leakyRelu, logSigmoid, relu, sigmoid, tanh, softmax, null.
  • fc_layers (default: null): List of dictionaries containing the parameters of all the fully connected layers. The length of the list determines the number of stacked fully connected layers and the content of each dictionary determines the parameters for a specific layer. The available parameters for each layer are: activation, dropout, norm, norm_params, output_size, use_bias, bias_initializer and weights_initializer. If any of those values is missing from the dictionary, the default one provided as a standalone parameter will be used instead.
  • fc_use_bias (default: true): Whether the layer uses a bias vector in the fc_stack. Options: true, false.
  • fc_weights_initializer (default: xavier_uniform): The weights initializer to use for the layers in the fc_stack
  • fc_bias_initializer (default: zeros): The bias initializer to use for the layers in the fc_stack
  • fc_norm_params (default: null): Default parameters passed to the norm module.
  • use_bias (default: true): Whether the layer uses a bias vector. Options: true, false.
  • weights_initializer (default: xavier_uniform): Initializer for the weight matrix. Options: uniform, normal, constant, ones, zeros, eye, dirac, xavier_uniform, xavier_normal, kaiming_uniform, kaiming_normal, orthogonal, sparse, identity.
  • bias_initializer (default: zeros): Initializer for the bias vector. Options: uniform, normal, constant, ones, zeros, eye, dirac, xavier_uniform, xavier_normal, kaiming_uniform, kaiming_normal, orthogonal, sparse, identity.

Decoder type and decoder parameters can also be defined once and applied to all number output features using the Type-Global Decoder section.

Loss

Mean Squared Error (MSE)

loss:
    type: mean_squared_error
    weight: 1.0

Parameters:

  • weight (default: 1.0): Weight of the loss.

Mean Absolute Error (MAE)

loss:
    type: mean_absolute_error
    weight: 1.0

Parameters:

  • weight (default: 1.0): Weight of the loss.

Mean Absolute Percentage Error (MAPE)

loss:
    type: mean_absolute_percentage_error
    weight: 1.0

Parameters:

  • weight (default: 1.0): Weight of the loss.

Root Mean Squared Error (RMSE)

loss:
    type: root_mean_squared_error
    weight: 1.0

Parameters:

  • weight (default: 1.0): Weight of the loss.

Root Mean Squared Percentage Error (RMSPE)

loss:
    type: root_mean_squared_percentage_error
    weight: 1.0

Parameters:

  • weight (default: 1.0): Weight of the loss.

Huber Loss

loss:
    type: huber
    weight: 1.0
    delta: 1.0

Parameters:

  • weight (default: 1.0): Weight of the loss.
  • delta (default: 1.0): Threshold at which to change between delta-scaled L1 and L2 loss.

Loss and loss related parameters can also be defined once and applied to all number output features using the Type-Global Loss section.

Metrics

The metrics that are calculated every epoch and are available for number features are mean_squared_error, mean_absolute_error, root_mean_squared_error, root_mean_squared_percentage_error and the loss itself. You can set either of them as validation_metric in the training section of the configuration if you set the validation_field to be the name of a number feature.