# ⇅ Number Features

## Preprocessing¶

Number features are directly transformed into a float valued vector of length `n`

(where `n`

is the size of the dataset)
and added to the HDF5 with a key that reflects the name of column in the dataset.
No additional information about them is available in the JSON metadata file.

```
preprocessing:
missing_value_strategy: fill_with_const
normalization: zscore
outlier_strategy: null
fill_value: 0.0
outlier_threshold: 3.0
```

Parameters:

(default:`missing_value_strategy`

`fill_with_const`

) : What strategy to follow when there's a missing value in a number column Options:`fill_with_const`

,`fill_with_mode`

,`bfill`

,`ffill`

,`drop_row`

,`fill_with_mean`

. See Missing Value Strategy for details.(default:`normalization`

`zscore`

) : Normalization strategy to use for this number feature. If the value is`null`

no normalization is performed. Options:`zscore`

,`minmax`

,`log1p`

,`iq`

,`null`

. See Normalization for details.(default:`outlier_strategy`

`null`

) : Determines how outliers will be handled in the dataset. In most cases, replacing outliers with the column mean (`fill_with_mean`

) will be sufficient, but in others the outliers may be damaging enough to merit dropping the entire row of data (`drop_row`

). In some cases, the best way to handle outliers is to leave them in the data, which is the behavior when this parameter is left as`null`

. Options:`fill_with_const`

,`fill_with_mode`

,`bfill`

,`ffill`

,`drop_row`

,`fill_with_mean`

,`null`

.(default:`fill_value`

`0.0`

): The value to replace missing values with in case the missing_value_strategy is fill_with_const(default:`outlier_threshold`

`3.0`

): Standard deviations from the mean past which a value is considered an outlier. The 3-sigma rule in statistics tells us that when data is normally distributed, 95% of the data will lie within 2 standard deviations of the mean, and greater than 99% of the data will lie within 3 standard deviations of the mean (see: 68–95–99.7 rule). As such anything farther away than that is highly likely to be an outlier, and may distort the learning process by disproportionately affecting the model.

Preprocessing parameters can also be defined once and applied to all number input features using the Type-Global Preprocessing section.

### Normalization¶

Technique to be used when normalizing the number feature types.

Options:

: No normalization is performed.`null`

: The mean and standard deviation are computed so that values are shifted to have zero mean and 1 standard deviation.`zscore`

: The minimum is subtracted from values and the result is divided by difference between maximum and minimum.`minmax`

: The value returned is the natural log of 1 plus the original value. Note:`log1p`

`log1p`

is defined only for positive values.: The median is subtracted from values and the result is divided by the interquartile range (IQR), i.e., the 75th percentile value minus the 25th percentile value. The resulting data has a zero mean and median and a standard deviation of 1. This is useful if your feature has large outliers since the normalization won't be skewed by those values.`iq`

The best normalization techniqe to use depends on the distribution of your data, but `zscore`

is a good place to start in many cases.

## Input Features¶

Number features have two encoders.
One encoder (`passthrough`

) simply returns the raw numerical values coming from the input placeholders as outputs.
Inputs are of size `b`

while outputs are of size `b x 1`

where `b`

is the batch size.
The other encoder (`dense`

) passes the raw numerical values through fully connected layers.
In this case the inputs of size `b`

are transformed to size `b x h`

.

The encoder parameters specified at the feature level are:

(default`tied`

`null`

): name of the input feature to tie the weights of the encoder with. It needs to be the name of a feature of the same type and with the same encoder parameters.

Example number feature entry in the input features list:

```
name: number_column_name
type: number
tied: null
encoder:
type: dense
```

The available encoder parameters:

(default`type`

`passthrough`

): the possible values are`passthrough`

,`dense`

and`sparse`

.`passthrough`

outputs the raw integer values unaltered.`dense`

randomly initializes a trainable embedding matrix,`sparse`

uses one-hot encoding.

Encoder type and encoder parameters can also be defined once and applied to all number input features using the Type-Global Encoder section.

### Encoders¶

#### Passthrough Encoder¶

```
encoder:
type: passthrough
```

There are no additional parameters for `passthrough`

encoder.

#### Dense Encoder¶

```
encoder:
type: dense
dropout: 0.0
output_size: 256
norm: null
num_layers: 1
activation: relu
use_bias: true
bias_initializer: zeros
weights_initializer: xavier_uniform
norm_params: null
fc_layers: null
```

Parameters:

(default:`dropout`

`0.0`

) : Default dropout rate applied to fully connected layers. Increasing dropout is a common form of regularization to combat overfitting. The dropout is expressed as the probability of an element to be zeroed out (0.0 means no dropout).(default:`output_size`

`256`

) : Size of the output of the feature.(default:`norm`

`null`

) : Default normalization applied at the beginnging of fully connected layers. Options:`batch`

,`layer`

,`ghost`

,`null`

. See Normalization for details.(default:`num_layers`

`1`

) : Number of stacked fully connected layers to apply. Increasing layers adds capacity to the model, enabling it to learn more complex feature interactions.(default:`activation`

`relu`

): Default activation function applied to the output of the fully connected layers. Options:`elu`

,`leakyRelu`

,`logSigmoid`

,`relu`

,`sigmoid`

,`tanh`

,`softmax`

,`null`

.(default:`use_bias`

`true`

): Whether the layer uses a bias vector. Options:`true`

,`false`

.-
(default:`bias_initializer`

`zeros`

): Initializer for the bias vector. Options:`uniform`

,`normal`

,`constant`

,`ones`

,`zeros`

,`eye`

,`dirac`

,`xavier_uniform`

,`xavier_normal`

,`kaiming_uniform`

,`kaiming_normal`

,`orthogonal`

,`sparse`

,`identity`

. Alternatively it is possible to specify a dictionary with a key`type`

that identifies the type of initializer and other keys for its parameters, e.g.`{type: normal, mean: 0, stddev: 0}`

. For a description of the parameters of each initializer, see torch.nn.init. -
(default:`weights_initializer`

`xavier_uniform`

): Initializer for the weight matrix. Options:`uniform`

,`normal`

,`constant`

,`ones`

,`zeros`

,`eye`

,`dirac`

,`xavier_uniform`

,`xavier_normal`

,`kaiming_uniform`

,`kaiming_normal`

,`orthogonal`

,`sparse`

,`identity`

. Alternatively it is possible to specify a dictionary with a key`type`

that identifies the type of initializer and other keys for its parameters, e.g.`{type: normal, mean: 0, stddev: 0}`

. For a description of the parameters of each initializer, see torch.nn.init. -
(default:`norm_params`

`null`

): Default parameters passed to the`norm`

module. (default:`fc_layers`

`null`

): List of dictionaries containing the parameters of all the fully connected layers. The length of the list determines the number of stacked fully connected layers and the content of each dictionary determines the parameters for a specific layer. The available parameters for each layer are:`activation`

,`dropout`

,`norm`

,`norm_params`

,`output_size`

,`use_bias`

,`bias_initializer`

and`weights_initializer`

. If any of those values is missing from the dictionary, the default one provided as a standalone parameter will be used instead.

## Output Features¶

Number features can be used when a regression needs to be performed. There is only one decoder available for number features: a (potentially empty) stack of fully connected layers, followed by a projection to a single number.

Example number output feature using default parameters:

```
name: number_column_name
type: number
reduce_input: sum
dependencies: []
reduce_dependencies: sum
loss:
type: mean_squared_error
decoder:
type: regressor
```

Parameters:

(default`reduce_input`

`sum`

): defines how to reduce an input that is not a vector, but a matrix or a higher order tensor, on the first dimension (second if you count the batch dimension). Available values are:`sum`

,`mean`

or`avg`

,`max`

,`concat`

(concatenates along the first dimension),`last`

(returns the last vector of the first dimension).(default`dependencies`

`[]`

): the output features this one is dependent on. For a detailed explanation refer to Output Feature Dependencies.(default`reduce_dependencies`

`sum`

): defines how to reduce the output of a dependent feature that is not a vector, but a matrix or a higher order tensor, on the first dimension (second if you count the batch dimension). Available values are:`sum`

,`mean`

or`avg`

,`max`

,`concat`

(concatenates along the first dimension),`last`

(returns the last vector of the first dimension).(default`loss`

`{type: mean_squared_error}`

): is a dictionary containing a loss`type`

. Options:`mean_squared_error`

,`mean_absolute_error`

,`root_mean_squared_error`

,`root_mean_squared_percentage_error`

. See Loss for details.(default:`decoder`

`{"type": "regressor"}`

): Decoder for the desired task. Options:`regressor`

. See Decoder for details.

### Decoders¶

#### Regressor¶

```
decoder:
type: regressor
num_fc_layers: 0
fc_output_size: 256
fc_norm: null
fc_dropout: 0.0
fc_activation: relu
fc_layers: null
fc_use_bias: true
fc_weights_initializer: xavier_uniform
fc_bias_initializer: zeros
fc_norm_params: null
use_bias: true
weights_initializer: xavier_uniform
bias_initializer: zeros
```

Parameters:

(default:`num_fc_layers`

`0`

) : Number of fully-connected layers if`fc_layers`

not specified. Increasing layers adds capacity to the model, enabling it to learn more complex feature interactions.(default:`fc_output_size`

`256`

) : Output size of fully connected stack.(default:`fc_norm`

`null`

) : Default normalization applied at the beginnging of fully connected layers. Options:`batch`

,`layer`

,`ghost`

,`null`

. See Normalization for details.(default:`fc_dropout`

`0.0`

) : Default dropout rate applied to fully connected layers. Increasing dropout is a common form of regularization to combat overfitting. The dropout is expressed as the probability of an element to be zeroed out (0.0 means no dropout).(default:`fc_activation`

`relu`

): Default activation function applied to the output of the fully connected layers. Options:`elu`

,`leakyRelu`

,`logSigmoid`

,`relu`

,`sigmoid`

,`tanh`

,`softmax`

,`null`

.(default:`fc_layers`

`null`

): List of dictionaries containing the parameters of all the fully connected layers. The length of the list determines the number of stacked fully connected layers and the content of each dictionary determines the parameters for a specific layer. The available parameters for each layer are:`activation`

,`dropout`

,`norm`

,`norm_params`

,`output_size`

,`use_bias`

,`bias_initializer`

and`weights_initializer`

. If any of those values is missing from the dictionary, the default one provided as a standalone parameter will be used instead.(default:`fc_use_bias`

`true`

): Whether the layer uses a bias vector in the fc_stack. Options:`true`

,`false`

.(default:`fc_weights_initializer`

`xavier_uniform`

): The weights initializer to use for the layers in the fc_stack(default:`fc_bias_initializer`

`zeros`

): The bias initializer to use for the layers in the fc_stack(default:`fc_norm_params`

`null`

): Default parameters passed to the`norm`

module.(default:`use_bias`

`true`

): Whether the layer uses a bias vector. Options:`true`

,`false`

.(default:`weights_initializer`

`xavier_uniform`

): Initializer for the weight matrix. Options:`uniform`

,`normal`

,`constant`

,`ones`

,`zeros`

,`eye`

,`dirac`

,`xavier_uniform`

,`xavier_normal`

,`kaiming_uniform`

,`kaiming_normal`

,`orthogonal`

,`sparse`

,`identity`

.(default:`bias_initializer`

`zeros`

): Initializer for the bias vector. Options:`uniform`

,`normal`

,`constant`

,`ones`

,`zeros`

,`eye`

,`dirac`

,`xavier_uniform`

,`xavier_normal`

,`kaiming_uniform`

,`kaiming_normal`

,`orthogonal`

,`sparse`

,`identity`

.

Decoder type and decoder parameters can also be defined once and applied to all number output features using the Type-Global Decoder section.

### Loss¶

#### Mean Squared Error (MSE)¶

```
loss:
type: mean_squared_error
weight: 1.0
```

Parameters:

(default:`weight`

`1.0`

): Weight of the loss.

#### Mean Absolute Error (MAE)¶

```
loss:
type: mean_absolute_error
weight: 1.0
```

Parameters:

(default:`weight`

`1.0`

): Weight of the loss.

#### Mean Absolute Percentage Error (MAPE)¶

```
loss:
type: mean_absolute_percentage_error
weight: 1.0
```

Parameters:

(default:`weight`

`1.0`

): Weight of the loss.

#### Root Mean Squared Error (RMSE)¶

```
loss:
type: root_mean_squared_error
weight: 1.0
```

Parameters:

(default:`weight`

`1.0`

): Weight of the loss.

#### Root Mean Squared Percentage Error (RMSPE)¶

```
loss:
type: root_mean_squared_percentage_error
weight: 1.0
```

Parameters:

(default:`weight`

`1.0`

): Weight of the loss.

#### Huber Loss¶

```
loss:
type: huber
weight: 1.0
delta: 1.0
```

Parameters:

(default:`weight`

`1.0`

): Weight of the loss.(default:`delta`

`1.0`

): Threshold at which to change between delta-scaled L1 and L2 loss.

Loss and loss related parameters can also be defined once and applied to all number output features using the Type-Global Loss section.

### Metrics¶

The metrics that are calculated every epoch and are available for number features are `mean_squared_error`

,
`mean_absolute_error`

, `root_mean_squared_error`

, `root_mean_squared_percentage_error`

and the `loss`

itself.
You can set either of them as `validation_metric`

in the `training`

section of the configuration if you set the
`validation_field`

to be the name of a number feature.