Date Features
Date Features Preprocessing¶
Ludwig will try to infer the date format automatically, but a specific fomrat can be provided. The format is the same one described in the datetime package documentation.
missing_value_strategy(defaultfill_with_const): what strategy to follow when there's a missing value in a binary column. The value should be one offill_with_const(replaces the missing value with a specific value specified with thefill_valueparameter),fill_with_mode(replaces the missing values with the most frequent value in the column),fill_with_mean(replaces the missing values with the mean of the values in the column),backfill(replaces the missing values with the next valid value).fill_value(default""): the value to replace the missing values with in case themissing_value_strategyisfill_value. This can be a datetime string, if left empty the current datetime will be used.datetime_format(defaultnull): this parameter can be eithernull, which implies the datetime format is inferred automaticall, or a datetime format string.
Example of a preprocessing specification:
name: date_feature_name
type: date
preprocessing:
missing_value_strategy: fill_with_const
fill_value: ''
datetime_format: "%d %b %Y"
Date Input Features and Encoders¶
Input date features are transformed into a int valued tensors of size N x 8 (where N is the size of the dataset and the 8 dimensions contain year, month, day, weekday, yearday, hour, minute and second) and added to HDF5 with a key that reflects the name of column in the dataset.
Currently there are two encoders supported for dates: Embed Encoder and Wave encoder which can be set by setting encoder parameter to embed or wave in the input feature dictionary in the configuration (embed is the default one).
Embed Encoder¶
This encoder passes the year through a fully connected layer of one neuron and embeds all other elements for the date, concatenates them and passes the concatenated representation through fully connected layers. It takes the following optional parameters:
embedding_size(default10): it is the maximum embedding size adopted..embeddings_on_cpu(defaultfalse): by default embeddings matrices are stored on GPU memory if a GPU is used, as it allows for faster access, but in some cases the embedding matrix may be really big and this parameter forces the placement of the embedding matrix in regular memory and the CPU is used to resolve them, slightly slowing down the process as a result of data transfer between CPU and GPU memory.dropout(defaultfalse): determines if there should be a dropout layer before returning the encoder output.fc_layers(defaultnull): it is a list of dictionaries containing the parameters of all the fully connected layers. The length of the list determines the number of stacked fully connected layers and the content of each dictionary determines the parameters for a specific layer. The available parameters for each layer are:fc_size,norm,activationandregularize. If any of those values is missing from the dictionary, the default one specified as a parameter of the encoder will be used instead. If bothfc_layersandnum_fc_layersarenull, a default list will be assigned tofc_layerswith the value[{fc_size: 512}, {fc_size: 256}](only applies ifreduce_outputis notnull).num_fc_layers(default0): This is the number of stacked fully connected layers.fc_size(default10): if afc_sizeis not already specified infc_layersthis is the defaultfc_sizethat will be used for each layer. It indicates the size of the output of a fully connected layer.use_bias(defaulttrue): boolean, whether the layer uses a bias vector.weights_initializer(default'glorot_uniform'): initializer for the weights matrix. Options are:constant,identity,zeros,ones,orthogonal,normal,uniform,truncated_normal,variance_scaling,glorot_normal,glorot_uniform,xavier_normal,xavier_uniform,he_normal,he_uniform,lecun_normal,lecun_uniform. Alternatively it is possible to specify a dictionary with a keytypethat identifies the type of initializer and other keys for its parameters, e.g.{type: normal, mean: 0, stddev: 0}. To know the parameters of each initializer, please refer to TensorFlow's documentation.bias_initializer(default'zeros'): initializer for the bias vector. Options are:constant,identity,zeros,ones,orthogonal,normal,uniform,truncated_normal,variance_scaling,glorot_normal,glorot_uniform,xavier_normal,xavier_uniform,he_normal,he_uniform,lecun_normal,lecun_uniform. Alternatively it is possible to specify a dictionary with a keytypethat identifies the type of initializer and other keys for its parameters, e.g.{type: normal, mean: 0, stddev: 0}. To know the parameters of each initializer, please refer to TensorFlow's documentation.weights_regularizer(defaultnull): regularizer function applied to the weights matrix. Valid values arel1,l2orl1_l2.bias_regularizer(defaultnull): regularizer function applied to the bias vector. Valid values arel1,l2orl1_l2.activity_regularizer(defaultnull): regurlizer function applied to the output of the layer. Valid values arel1,l2orl1_l2.norm(defaultnull): if anormis not already specified infc_layersthis is the defaultnormthat will be used for each layer. It indicates the norm of the output and it can benull,batchorlayer.norm_params(defaultnull): parameters used ifnormis eitherbatchorlayer. For information on parameters used withbatchsee Tensorflow's documentation on batch normalization or forlayersee Tensorflow's documentation on layer normalization.activation(defaultrelu): if anactivationis not already specified infc_layersthis is the defaultactivationthat will be used for each layer. It indicates the activation function applied to the output.dropout(default0): dropout rate
Example date feature entry in the input features list using an embed encoder:
name: date_column_name
type: date
encoder: embed
embedding_size: 10
embeddings_on_cpu: false
dropout: false
fc_layers: null
num_fc_layers: 0
fc_size: 10
use_bias: true
weights_initializer: glorot_uniform
bias_initializer: zeros
weights_regularizer: null
bias_regularizer: null
activity_regularizer: null
norm: null
norm_params: null
activation: relu
dropout: 0
Wave Encoder¶
This encoder passes the year through a fully connected layer of one neuron and represents all other elements for the date by taking the sine of their value with a different period (12 for months, 31 for days, etc.), concatenates them and passes the concatenated representation through fully connected layers. It takes the following optional parameters:
fc_layers(defaultnull): it is a list of dictionaries containing the parameters of all the fully connected layers. The length of the list determines the number of stacked fully connected layers and the content of each dictionary determines the parameters for a specific layer. The available parameters for each layer are:fc_size,norm,activationandregularize. If any of those values is missing from the dictionary, the default one specified as a parameter of the encoder will be used instead. If bothfc_layersandnum_fc_layersarenull, a default list will be assigned tofc_layerswith the value[{fc_size: 512}, {fc_size: 256}](only applies ifreduce_outputis notnull).num_fc_layers(default0): This is the number of stacked fully connected layers.fc_size(default10): if afc_sizeis not already specified infc_layersthis is the defaultfc_sizethat will be used for each layer. It indicates the size of the output of a fully connected layer.use_bias(defaulttrue): boolean, whether the layer uses a bias vector.weights_initializer(default'glorot_uniform'): initializer for the weights matrix. Options are:constant,identity,zeros,ones,orthogonal,normal,uniform,truncated_normal,variance_scaling,glorot_normal,glorot_uniform,xavier_normal,xavier_uniform,he_normal,he_uniform,lecun_normal,lecun_uniform. Alternatively it is possible to specify a dictionary with a keytypethat identifies the type of initializer and other keys for its parameters, e.g.{type: normal, mean: 0, stddev: 0}. To know the parameters of each initializer, please refer to TensorFlow's documentation.bias_initializer(default'zeros'): initializer for the bias vector. Options are:constant,identity,zeros,ones,orthogonal,normal,uniform,truncated_normal,variance_scaling,glorot_normal,glorot_uniform,xavier_normal,xavier_uniform,he_normal,he_uniform,lecun_normal,lecun_uniform. Alternatively it is possible to specify a dictionary with a keytypethat identifies the type of initializer and other keys for its parameters, e.g.{type: normal, mean: 0, stddev: 0}. To know the parameters of each initializer, please refer to TensorFlow's documentation.weights_regularizer(defaultnull): regularizer function applied to the weights matrix. Valid values arel1,l2orl1_l2.bias_regularizer(defaultnull): regularizer function applied to the bias vector. Valid values arel1,l2orl1_l2.activity_regularizer(defaultnull): regurlizer function applied to the output of the layer. Valid values arel1,l2orl1_l2.norm(defaultnull): if anormis not already specified infc_layersthis is the defaultnormthat will be used for each layer. It indicates the norm of the output and it can benull,batchorlayer.norm_params(defaultnull): parameters used ifnormis eitherbatchorlayer. For information on parameters used withbatchsee Tensorflow's documentation on batch normalization or forlayersee Tensorflow's documentation on layer normalization.activation(defaultrelu): if anactivationis not already specified infc_layersthis is the defaultactivationthat will be used for each layer. It indicates the activation function applied to the output.dropout(default0): dropout rate
Example date feature entry in the input features list using a wave encoder:
name: date_column_name
type: date
encoder: wave
fc_layers: null
num_fc_layers: 0
fc_size: 10
use_bias: true
weights_initializer: glorot_uniform
bias_initializer: zeros
weights_regularizer: null
bias_regularizer: null
activity_regularizer: null
norm: null
norm_params: null
activation: relu
dropout: 0
Date Output Features and Decoders¶
There are no date decoders at the moment (WIP), so date cannot be used as output features.
Date Features Measures¶
As no date decoders are available at the moment, there are also no date measures.