⇅ Binary Features

Preprocessing¶

Binary features are directly transformed into a binary valued vector of length n (where n is the size of the dataset) and added to the HDF5 with a key that reflects the name of column in the dataset.

preprocessing:
    missing_value_strategy: fill_with_false
    fallback_true_label: null
    fill_value: null

Parameters:

missing_value_strategy (default: fill_with_false) : What strategy to follow when there's a missing value in a binary column Options: fill_with_mode, bfill, ffill, drop_row, fill_with_false, fill_with_true. See Missing Value Strategy for details.
fallback_true_label (default: null): The label to interpret as 1 (True) when the binary feature doesn't have a conventional boolean value
fill_value (default: null): The value to replace missing values with in case the missing_value_strategy is fill_with_const

Preprocessing parameters can also be defined once and applied to all binary input features using the Type-Global Preprocessing section.

Input Features¶

Binary features have two encoders, passthrough and dense. The available encoder can be specified using the type parameter:

type (default passthrough): the possible values are passthrough and dense. passthrough outputs the raw integer values unaltered. dense randomly initializes a trainable embedding matrix.

The encoder parameters specified at the feature level are:

tied (default null): name of another input feature to tie the weights of the encoder with. It needs to be the name of a feature of the same type and with the same encoder parameters.

Example binary feature entry in the input features list:

name: binary_column_name
type: binary
tied: null
encoder:
    type: dense

Encoder type and encoder parameters can also be defined once and applied to all binary input features using the Type-Global Encoder section.

Encoders¶

Passthrough Encoder¶

The passthrough encoder passes through raw binary values without any transformations. Inputs of size b are transformed to outputs of size b x 1 where b is the batch size.

encoder:
    type: passthrough
    skip: false
    adapter: null

There are no additional parameters for the passthrough encoder.

Dense Encoder¶

The dense encoder passes the raw binary values through a fully connected layer. Inputs of size b are transformed to size b x h.

encoder:
    type: dense
    dropout: 0.0
    output_size: 256
    norm: null
    num_layers: 1
    activation: relu
    use_bias: true
    bias_initializer: zeros
    weights_initializer: xavier_uniform
    norm_params: null
    fc_layers: null
    skip: false
    adapter: null

Parameters:

dropout (default: 0.0) : Default dropout rate applied to fully connected layers. Increasing dropout is a common form of regularization to combat overfitting. The dropout is expressed as the probability of an element to be zeroed out (0.0 means no dropout).
output_size (default: 256) : Size of the output of the feature.
norm (default: null) : Default normalization applied at the beginnging of fully connected layers. Options: batch, layer, ghost, null.
num_layers (default: 1) : Number of stacked fully connected layers to apply. Increasing layers adds capacity to the model, enabling it to learn more complex feature interactions.
activation (default: relu): Default activation function applied to the output of the fully connected layers. Options: elu, leakyRelu, logSigmoid, relu, sigmoid, tanh, softmax, gelu, silu, swish, mish, selu, prelu, relu6, hardswish, hardsigmoid, softplus, celu, swiglu, geglu, reglu, sparsemax, entmax15, null.
use_bias (default: true): Whether the layer uses a bias vector.
bias_initializer (default: zeros): Initializer for the bias vector. Options: uniform, normal, constant, ones, zeros, eye, dirac, xavier_uniform, xavier_normal, kaiming_uniform, kaiming_normal, orthogonal, sparse, identity. Alternatively it is possible to specify a dictionary with a key type that identifies the type of initializer and other keys for its parameters, e.g. {type: normal, mean: 0, stddev: 0}. For a description of the parameters of each initializer, see torch.nn.init.
weights_initializer (default: xavier_uniform): Initializer for the weight matrix. Options: uniform, normal, constant, ones, zeros, eye, dirac, xavier_uniform, xavier_normal, kaiming_uniform, kaiming_normal, orthogonal, sparse, identity. Alternatively it is possible to specify a dictionary with a key type that identifies the type of initializer and other keys for its parameters, e.g. {type: normal, mean: 0, stddev: 0}. For a description of the parameters of each initializer, see torch.nn.init.
norm_params (default: null): Default parameters passed to the norm module.
fc_layers (default: null): List of dictionaries containing the parameters of all the fully connected layers. The length of the list determines the number of stacked fully connected layers and the content of each dictionary determines the parameters for a specific layer. The available parameters for each layer are: activation, dropout, norm, norm_params, output_size, use_bias, bias_initializer and weights_initializer. If any of those values is missing from the dictionary, the default one provided as a standalone parameter will be used instead.
skip (default: false):
adapter (default: null):

Output Features¶

Binary output features can be used when a binary classification needs to be performed or when the output is a single probability. There is only one decoder available: regressor.

Example binary output feature using default parameters:

name: binary_column_name
type: binary
reduce_input: sum
dependencies: []
calibration: false
reduce_dependencies: sum
threshold: 0.5
decoder:
  type: regressor
  fc_layers: null
  num_fc_layers: 0
  fc_output_size: 256
  fc_use_bias: true
  fc_weights_initializer: xavier_uniform
  fc_bias_initializer: zeros
  fc_norm: null
  fc_norm_params: null
  fc_activation: relu
  fc_dropout: 0.0
  input_size: null
  use_bias: true
  weights_initializer: xavier_uniform
  bias_initializer: zeros
loss:
  type: binary_weighted_cross_entropy
  weight: 1.0
  positive_class_weight: null
  robust_lambda: 0
  confidence_penalty: 0

Parameters:

reduce_input (default sum): defines how to reduce an input that is not a vector, but a matrix or a higher order tensor, on the first dimension (second if you count the batch dimension). Available values are: sum, mean or avg, max, concat (concatenates along the first dimension), last (returns the last vector of the first dimension).
dependencies (default []): the output features this one is dependent on. For a detailed explanation refer to Output Features Dependencies.
calibration (default false): if true, performs calibration by temperature scaling after training is complete. Calibration uses the validation set to find a scale factor (temperature) which is multiplied with the logits to shift output probabilities closer to true likelihoods.
reduce_dependencies (default sum): defines how to reduce the output of a dependent feature that is not a vector, but a matrix or a higher order tensor, on the first dimension (second if you count the batch dimension). Available values are: sum, mean or avg, max, concat (concatenates along the first dimension), last (returns the last vector of the first dimension).
threshold (defaults 0.5): The threshold above (greater or equal) which the predicted output of the sigmoid function will be mapped to 1.
loss (default {"type": "binary_weighted_cross_entropy"}): is a dictionary containing a loss type. binary_weighted_cross_entropy is the only supported loss type for binary output features. See Loss for details.
decoder (default: {"type": "regressor"}): Decoder for the desired task. Options: regressor. See Decoder for details.

Decoder type and decoder parameters can also be defined once and applied to all binary output features using the Type-Global Decoder section.

Decoders¶

Regressor¶

graph LR
  A["Combiner\n Output"] --> B["Fully\n Connected\n Layers"];
  B --> C["Projection into\n Output Space"];
  C --> D["Sigmoid"];
  subgraph DEC["DECODER.."]
  B
  C
  D
  end

The regressor decoder is a (potentially empty) stack of fully connected layers, followed by a projection into a single number followed by a sigmoid function.

decoder:
    type: regressor
    use_bias: true
    weights_initializer: xavier_uniform
    bias_initializer: zeros
    fc_layers: null
    num_fc_layers: 0
    fc_output_size: 256
    fc_use_bias: true
    fc_weights_initializer: xavier_uniform
    fc_bias_initializer: zeros
    fc_norm: null
    fc_norm_params: null
    fc_activation: relu
    fc_dropout: 0.0

Parameters:

use_bias (default: true): Whether the layer uses a bias vector.
weights_initializer (default: xavier_uniform): Initializer for the weight matrix. Options: uniform, normal, constant, ones, zeros, eye, dirac, xavier_uniform, xavier_normal, kaiming_uniform, kaiming_normal, orthogonal, sparse, identity.
bias_initializer (default: zeros): Initializer for the bias vector. Options: uniform, normal, constant, ones, zeros, eye, dirac, xavier_uniform, xavier_normal, kaiming_uniform, kaiming_normal, orthogonal, sparse, identity.
fc_layers (default: null):
num_fc_layers (default: 0):
fc_output_size (default: 256):
fc_use_bias (default: true):
fc_weights_initializer (default: xavier_uniform):
fc_bias_initializer (default: zeros):
fc_norm (default: null):
fc_norm_params (default: null):
fc_activation (default: relu):
fc_dropout (default: 0.0):

Loss¶

Binary Weighted Cross Entropy¶

loss:
    type: binary_weighted_cross_entropy
    positive_class_weight: null
    weight: 1.0
    robust_lambda: 0
    confidence_penalty: 0

Parameters:

positive_class_weight (default: null) : Weight of the positive class.
weight (default: 1.0): Weight of the loss.
robust_lambda (default: 0): Replaces the loss with (1 - robust_lambda) * loss + robust_lambda / c where c is the number of classes. Useful in case of noisy labels.
confidence_penalty (default: 0): Penalizes overconfident predictions (low entropy) by adding an additional term that penalizes too confident predictions by adding a a * (max_entropy - entropy) / max_entropy term to the loss, where a is the value of this parameter. Useful in case of noisy labels.

Loss and loss related parameters can also be defined once and applied to all binary output features using the Type-Global Loss section.

Metrics¶

The metrics that are calculated every epoch and are available for binary features are the accuracy, loss, precision, recall, roc_auc and specificity.

You can set any of these to be the validation_metric in the training section of the configuration if the validation_field is set as the name of a binary feature.