⇅ Binary Features
Preprocessing¶
Binary features are directly transformed into a binary valued vector of length n
(where n
is the size of the dataset) and added to the HDF5 with a key that reflects the name of column in the dataset.
preprocessing:
missing_value_strategy: fill_with_false
fallback_true_label: null
fill_value: null
Parameters:
missing_value_strategy
(default:fill_with_false
) : What strategy to follow when there's a missing value in a binary column Options:fill_with_const
,fill_with_mode
,bfill
,ffill
,drop_row
,fill_with_false
. See Missing Value Strategy for details.fallback_true_label
(default:null
): The label to interpret as 1 (True) when the binary feature doesn't have a conventional boolean valuefill_value
(default:null
): The value to replace missing values with in case the missing_value_strategy is fill_with_const
Preprocessing parameters can also be defined once and applied to all binary input features using the Type-Global Preprocessing section.
Input Features¶
Binary features have two encoders, passthrough
and dense
. The available encoder can be specified using the type
parameter:
type
(defaultpassthrough
): the possible values arepassthrough
anddense
.passthrough
outputs the raw integer values unaltered.dense
randomly initializes a trainable embedding matrix.
The encoder parameters specified at the feature level are:
tied
(defaultnull
): name of another input feature to tie the weights of the encoder with. It needs to be the name of a feature of the same type and with the same encoder parameters.
Example binary feature entry in the input features list:
name: binary_column_name
type: binary
tied: null
encoder:
type: dense
Encoder type and encoder parameters can also be defined once and applied to all binary input features using the Type-Global Encoder section.
Encoders¶
Passthrough Encoder¶
The passthrough
encoder passes through raw binary values without any transformations. Inputs of size b
are transformed to outputs of size b x 1
where b
is the batch size.
encoder:
type: passthrough
There are no additional parameters for the passthrough
encoder.
Dense Encoder¶
The dense
encoder passes the raw binary values through a fully connected layer. Inputs of size b
are transformed to size b x h
.
encoder:
type: dense
dropout: 0.0
output_size: 256
norm: null
num_layers: 1
activation: relu
use_bias: true
bias_initializer: zeros
weights_initializer: xavier_uniform
norm_params: null
fc_layers: null
Parameters:
dropout
(default:0.0
) : Default dropout rate applied to fully connected layers. Increasing dropout is a common form of regularization to combat overfitting. The dropout is expressed as the probability of an element to be zeroed out (0.0 means no dropout).output_size
(default:256
) : Size of the output of the feature.norm
(default:null
) : Default normalization applied at the beginnging of fully connected layers. Options:batch
,layer
,ghost
,null
.num_layers
(default:1
) : Number of stacked fully connected layers to apply. Increasing layers adds capacity to the model, enabling it to learn more complex feature interactions.activation
(default:relu
): Default activation function applied to the output of the fully connected layers. Options:elu
,leakyRelu
,logSigmoid
,relu
,sigmoid
,tanh
,softmax
,null
.use_bias
(default:true
): Whether the layer uses a bias vector. Options:true
,false
.-
bias_initializer
(default:zeros
): Initializer for the bias vector. Options:uniform
,normal
,constant
,ones
,zeros
,eye
,dirac
,xavier_uniform
,xavier_normal
,kaiming_uniform
,kaiming_normal
,orthogonal
,sparse
,identity
. Alternatively it is possible to specify a dictionary with a keytype
that identifies the type of initializer and other keys for its parameters, e.g.{type: normal, mean: 0, stddev: 0}
. For a description of the parameters of each initializer, see torch.nn.init. -
weights_initializer
(default:xavier_uniform
): Initializer for the weight matrix. Options:uniform
,normal
,constant
,ones
,zeros
,eye
,dirac
,xavier_uniform
,xavier_normal
,kaiming_uniform
,kaiming_normal
,orthogonal
,sparse
,identity
. Alternatively it is possible to specify a dictionary with a keytype
that identifies the type of initializer and other keys for its parameters, e.g.{type: normal, mean: 0, stddev: 0}
. For a description of the parameters of each initializer, see torch.nn.init. -
norm_params
(default:null
): Default parameters passed to thenorm
module. fc_layers
(default:null
): List of dictionaries containing the parameters of all the fully connected layers. The length of the list determines the number of stacked fully connected layers and the content of each dictionary determines the parameters for a specific layer. The available parameters for each layer are:activation
,dropout
,norm
,norm_params
,output_size
,use_bias
,bias_initializer
andweights_initializer
. If any of those values is missing from the dictionary, the default one provided as a standalone parameter will be used instead.
Output Features¶
Binary output features can be used when a binary classification needs to be performed or when the output is a single probability. There is only one decoder available: regressor
.
Example binary output feature using default parameters:
name: binary_column_name
type: binary
reduce_input: sum
dependencies: []
calibration: false
reduce_dependencies: sum
threshold: 0.5
decoder:
type: regressor
fc_layers: null
num_fc_layers: 0
fc_output_size: 256
fc_use_bias: true
fc_weights_initializer: xavier_uniform
fc_bias_initializer: zeros
fc_norm: null
fc_norm_params: null
fc_activation: relu
fc_dropout: 0.0
input_size: null
use_bias: true
weights_initializer: xavier_uniform
bias_initializer: zeros
loss:
type: binary_weighted_cross_entropy
weight: 1.0
positive_class_weight: null
robust_lambda: 0
confidence_penalty: 0
Parameters:
reduce_input
(defaultsum
): defines how to reduce an input that is not a vector, but a matrix or a higher order tensor, on the first dimension (second if you count the batch dimension). Available values are:sum
,mean
oravg
,max
,concat
(concatenates along the first dimension),last
(returns the last vector of the first dimension).dependencies
(default[]
): the output features this one is dependent on. For a detailed explanation refer to Output Features Dependencies.calibration
(defaultfalse
): if true, performs calibration by temperature scaling after training is complete. Calibration uses the validation set to find a scale factor (temperature) which is multiplied with the logits to shift output probabilities closer to true likelihoods.reduce_dependencies
(defaultsum
): defines how to reduce the output of a dependent feature that is not a vector, but a matrix or a higher order tensor, on the first dimension (second if you count the batch dimension). Available values are:sum
,mean
oravg
,max
,concat
(concatenates along the first dimension),last
(returns the last vector of the first dimension).threshold
(defaults0.5
): The threshold above (greater or equal) which the predicted output of the sigmoid function will be mapped to 1.loss
(default{"type": "binary_weighted_cross_entropy"}
): is a dictionary containing a losstype
.binary_weighted_cross_entropy
is the only supported loss type for binary output features. See Loss for details.decoder
(default:{"type": "regressor"}
): Decoder for the desired task. Options:regressor
. See Decoder for details.
Decoder type and decoder parameters can also be defined once and applied to all binary output features using the Type-Global Decoder section.
Decoders¶
Regressor¶
graph LR
A["Combiner\n Output"] --> B["Fully\n Connected\n Layers"];
B --> C["Projection into\n Output Space"];
C --> D["Sigmoid"];
subgraph DEC["DECODER.."]
B
C
D
end
The regressor decoder is a (potentially empty) stack of fully connected layers, followed by a projection into a single number followed by a sigmoid function.
decoder:
type: regressor
num_fc_layers: 0
fc_output_size: 256
fc_norm: null
fc_dropout: 0.0
fc_activation: relu
fc_layers: null
fc_use_bias: true
fc_weights_initializer: xavier_uniform
fc_bias_initializer: zeros
fc_norm_params: null
use_bias: true
weights_initializer: xavier_uniform
bias_initializer: zeros
Parameters:
num_fc_layers
(default:0
) : Number of fully-connected layers iffc_layers
not specified. Increasing layers adds capacity to the model, enabling it to learn more complex feature interactions.fc_output_size
(default:256
) : Output size of fully connected stack.fc_norm
(default:null
) : Default normalization applied at the beginnging of fully connected layers. Options:batch
,layer
,ghost
,null
.fc_dropout
(default:0.0
) : Default dropout rate applied to fully connected layers. Increasing dropout is a common form of regularization to combat overfitting. The dropout is expressed as the probability of an element to be zeroed out (0.0 means no dropout).fc_activation
(default:relu
): Default activation function applied to the output of the fully connected layers. Options:elu
,leakyRelu
,logSigmoid
,relu
,sigmoid
,tanh
,softmax
,null
.fc_layers
(default:null
): List of dictionaries containing the parameters of all the fully connected layers. The length of the list determines the number of stacked fully connected layers and the content of each dictionary determines the parameters for a specific layer. The available parameters for each layer are:activation
,dropout
,norm
,norm_params
,output_size
,use_bias
,bias_initializer
andweights_initializer
. If any of those values is missing from the dictionary, the default one provided as a standalone parameter will be used instead.fc_use_bias
(default:true
): Whether the layer uses a bias vector in the fc_stack. Options:true
,false
.fc_weights_initializer
(default:xavier_uniform
): The weights initializer to use for the layers in the fc_stackfc_bias_initializer
(default:zeros
): The bias initializer to use for the layers in the fc_stackfc_norm_params
(default:null
): Default parameters passed to thenorm
module.use_bias
(default:true
): Whether the layer uses a bias vector. Options:true
,false
.weights_initializer
(default:xavier_uniform
): Initializer for the weight matrix. Options:uniform
,normal
,constant
,ones
,zeros
,eye
,dirac
,xavier_uniform
,xavier_normal
,kaiming_uniform
,kaiming_normal
,orthogonal
,sparse
,identity
.bias_initializer
(default:zeros
): Initializer for the bias vector. Options:uniform
,normal
,constant
,ones
,zeros
,eye
,dirac
,xavier_uniform
,xavier_normal
,kaiming_uniform
,kaiming_normal
,orthogonal
,sparse
,identity
.
Loss¶
Binary Weighted Cross Entropy¶
loss:
type: binary_weighted_cross_entropy
positive_class_weight: null
weight: 1.0
robust_lambda: 0
confidence_penalty: 0
Parameters:
positive_class_weight
(default:null
) : Weight of the positive class.weight
(default:1.0
): Weight of the loss.robust_lambda
(default:0
): Replaces the loss with(1 - robust_lambda) * loss + robust_lambda / c
wherec
is the number of classes. Useful in case of noisy labels.confidence_penalty
(default:0
): Penalizes overconfident predictions (low entropy) by adding an additional term that penalizes too confident predictions by adding aa * (max_entropy - entropy) / max_entropy
term to the loss, where a is the value of this parameter. Useful in case of noisy labels.
Loss and loss related parameters can also be defined once and applied to all binary output features using the Type-Global Loss section.
Metrics¶
The metrics that are calculated every epoch and are available for binary features are the accuracy
, loss
,
precision
, recall
, roc_auc
and specificity
.
You can set any of these to be the validation_metric
in the training
section of the configuration if the validation_field
is set as the name of a binary feature.