Vector Features
Vector features allow to provide an ordered set of numerical values all at once. This is useful for providing pre-trained representations or activations obtained from other models or for providing multivariate inputs and outputs. An interesting use of vector features is the possibility to provide a probability distribution as output for a multiclass classification problem instead of just the correct class like it is possible to do with category features. This is useful for distillation and noise-aware losses.
Vector Feature Preprocessing¶
The data is expected as whitespace separated numerical values. Example: "1.0 0.0 1.04 10.49". All vectors are expected to be of the same size.
Preprocessing parameters:
vector_size(defaultnull): size of the vector. If not provided, it will be inferred from the data.missing_value_strategy(defaultfill_with_const): what strategy to follow when there's a missing value. The value should be one offill_with_const(replaces the missing value with a specific value specified with thefill_valueparameter),fill_with_mode(replaces the missing values with the most frequent value in the column),fill_with_mean(replaces the missing values with the mean of the values in the column),backfill(replaces the missing values with the next valid value).fill_value(default ""): the value to replace the missing values with in case themissing_value_strategyisfill_value.
Vector Feature Encoders¶
The vector feature supports two encoders: dense and passthrough. Only the dense encoder has additional parameters, which is shown next.
Dense Encoder¶
For vector features, you can use a dense encoder (stack of fully connected layers). It takes the following parameters:
layers(defaultnull): it is a list of dictionaries containing the parameters of all the fully connected layers. The length of the list determines the number of stacked fully connected layers and the content of each dictionary determines the parameters for a specific layer. The available parameters for each layer are:fc_size,norm,activationandregularize. If any of those values is missing from the dictionary, the default one specified as a parameter of the encoder will be used instead. If bothfc_layersandnum_fc_layersarenull, a default list will be assigned tofc_layerswith the value[{fc_size: 512}, {fc_size: 256}](only applies ifreduce_outputis notnull).num_layers(default0): This is the number of stacked fully connected layers.fc_size(default256): if afc_sizeis not already specified infc_layersthis is the defaultfc_sizethat will be used for each layer. It indicates the size of the output of a fully connected layer.use_bias(defaulttrue): boolean, whether the layer uses a bias vector.weights_initializer(default'glorot_uniform'): initializer for the weights matrix. Options are:constant,identity,zeros,ones,orthogonal,normal,uniform,truncated_normal,variance_scaling,glorot_normal,glorot_uniform,xavier_normal,xavier_uniform,he_normal,he_uniform,lecun_normal,lecun_uniform. Alternatively it is possible to specify a dictionary with a keytypethat identifies the type of initializer and other keys for its parameters, e.g.{type: normal, mean: 0, stddev: 0}. To know the parameters of each initializer, please refer to TensorFlow's documentation.bias_initializer(default'zeros'): initializer for the bias vector. Options are:constant,identity,zeros,ones,orthogonal,normal,uniform,truncated_normal,variance_scaling,glorot_normal,glorot_uniform,xavier_normal,xavier_uniform,he_normal,he_uniform,lecun_normal,lecun_uniform. Alternatively it is possible to specify a dictionary with a keytypethat identifies the type of initializer and other keys for its parameters, e.g.{type: normal, mean: 0, stddev: 0}. To know the parameters of each initializer, please refer to TensorFlow's documentation.weights_regularizer(defaultnull): regularizer function applied to the weights matrix. Valid values arel1,l2orl1_l2.bias_regularizer(defaultnull): regularizer function applied to the bias vector. Valid values arel1,l2orl1_l2.activity_regularizer(defaultnull): regurlizer function applied to the output of the layer. Valid values arel1,l2orl1_l2.norm(defaultnull): if anormis not already specified infc_layersthis is the defaultnormthat will be used for each layer. It indicates the norm of the output and it can benull,batchorlayer.norm_params(defaultnull): parameters used ifnormis eitherbatchorlayer. For information on parameters used withbatchsee Tensorflow's documentation on batch normalization or forlayersee Tensorflow's documentation on layer normalization.activation(defaultrelu): if anactivationis not already specified infc_layersthis is the defaultactivationthat will be used for each layer. It indicates the activation function applied to the output.dropout(default0): dropout rate
Example vector feature entry in the input features list using an dense encoder:
name: vector_column_name
type: vector
encoder: dense
layers: null
num_layers: 0
fc_size: 256
use_bias: true
weights_initializer: glorot_uniform
bias_initializer: zeros
weights_regularizer: null
bias_regularizer: null
activity_regularizer: null
norm: null
norm_params: null
activation: relu
dropout: 0
Vector Feature Decoders¶
Vector features can be used when multi-class classification needs to be performed with a noise-aware loss or when the task is multivariate regression. There is only one decoder available for set features and it is a (potentially empty) stack of fully connected layers, followed by a projection into a vector of size (optionally followed by a softmax in the case of multi-class classification).
+--------------+ +---------+ +-----------+
|Combiner | |Fully | |Projection | +------------------+
|Output +--->Connected+--->into Output+--->Softmax (optional)|
|Representation| |Layers | |Space | +------------------+
+--------------+ +---------+ +-----------+
These are the available parameters of the set output feature
reduce_input(defaultsum): defines how to reduce an input that is not a vector, but a matrix or a higher order tensor, on the first dimension (second if you count the batch dimension). Available values are:sum,meanoravg,max,concat(concatenates along the first dimension),last(returns the last vector of the first dimension).dependencies(default[]): the output features this one is dependent on. For a detailed explanation refer to Output Features Dependencies.reduce_dependencies(defaultsum): defines how to reduce the output of a dependent feature that is not a vector, but a matrix or a higher order tensor, on the first dimension (second if you count the batch dimension). Available values are:sum,meanoravg,max,concat(concatenates along the first dimension),last(returns the last vector of the first dimension).softmax(defaultfalse): determines if to apply a softmax at the end of the decoder. It is useful for predicting a vector of values that sum up to 1 and can be interpreted as probabilities.loss(default{type: mean_squared_error}): is a dictionary containing a losstype. The available losstypearemean_squared_error,mean_absolute_errorandsoftmax_cross_entropy(use it only ifsoftmaxistrue).
These are the available parameters of a set output feature decoder
fc_layers(defaultnull): it is a list of dictionaries containing the parameters of all the fully connected layers. The length of the list determines the number of stacked fully connected layers and the content of each dictionary determines the parameters for a specific layer. The available parameters for each layer are:fc_size,norm,activation,dropout,initializerandregularize. If any of those values is missing from the dictionary, the default one specified as a parameter of the decoder will be used instead.num_fc_layers(default 0): this is the number of stacked fully connected layers that the input to the feature passes through. Their output is projected in the feature's output space.fc_size(default256): if afc_sizeis not already specified infc_layersthis is the defaultfc_sizethat will be used for each layer. It indicates the size of the output of a fully connected layer.use_bias(defaulttrue): boolean, whether the layer uses a bias vector.weights_initializer(default'glorot_uniform'): initializer for the weights matrix. Options are:constant,identity,zeros,ones,orthogonal,normal,uniform,truncated_normal,variance_scaling,glorot_normal,glorot_uniform,xavier_normal,xavier_uniform,he_normal,he_uniform,lecun_normal,lecun_uniform. Alternatively it is possible to specify a dictionary with a keytypethat identifies the type of initializer and other keys for its parameters, e.g.{type: normal, mean: 0, stddev: 0}. To know the parameters of each initializer, please refer to TensorFlow's documentation.bias_initializer(default'zeros'): initializer for the bias vector. Options are:constant,identity,zeros,ones,orthogonal,normal,uniform,truncated_normal,variance_scaling,glorot_normal,glorot_uniform,xavier_normal,xavier_uniform,he_normal,he_uniform,lecun_normal,lecun_uniform. Alternatively it is possible to specify a dictionary with a keytypethat identifies the type of initializer and other keys for its parameters, e.g.{type: normal, mean: 0, stddev: 0}. To know the parameters of each initializer, please refer to TensorFlow's documentation.weights_regularizer(defaultnull): regularizer function applied to the weights matrix. Valid values arel1,l2orl1_l2.bias_regularizer(defaultnull): regularizer function applied to the bias vector. Valid values arel1,l2orl1_l2.activity_regularizer(defaultnull): regurlizer function applied to the output of the layer. Valid values arel1,l2orl1_l2.activation(defaultrelu): if anactivationis not already specified infc_layersthis is the defaultactivationthat will be used for each layer. It indicates the activation function applied to the output.clip(defaultnull): If notnullit specifies a minimum and maximum value the predictions will be clipped to. The value can be either a list or a tuple of length 2, with the first value representing the minimum and the second the maximum. For instance(-5,5)will make it so that all predictions will be clipped in the[-5,5]interval.
Example vector feature entry (with default parameters) in the output features list:
name: vector_column_name
type: vector
reduce_input: sum
dependencies: []
reduce_dependencies: sum
loss:
type: sigmoid_cross_entropy
fc_layers: null
num_fc_layers: 0
fc_size: 256
use_bias: true
weights_initializer: glorot_uniform
bias_initializer: zeros
weights_regularizer: null
bias_regularizer: null
activity_regularizer: null
activation: relu
clip: null
Vector Features Measures¶
The measures that are calculated every epoch and are available for numerical features are mean_squared_error, mean_absolute_error, r2 and the loss itself.
You can set either of them as validation_measure in the training section of the configuration if you set the validation_field to be the name of a numerical feature.