Image Features
Image Features Preprocessing¶
Ludwig supports both grayscale and color images.
The number of channels is inferred, but make sure all your images have the same number of channels.
During preprocessing, raw image files are transformed into numpy ndarrays and saved in the hdf5 format.
All images in the dataset should have the same size.
If they have different sizes, a resize_method
, together with a target width
and height
, must be specified in the feature preprocessing parameters.
missing_value_strategy
(default:backfill
): what strategy to follow when there's a missing value in a binary column. The value should be one offill_with_const
(replaces the missing value with a specific value specified with thefill_value
parameter),fill_with_mode
(replaces the missing values with the most frequent value in the column),fill_with_mean
(replaces the missing values with the mean of the values in the column),backfill
(replaces the missing values with the next valid value).in_memory
(defaulttrue
): defines whether image dataset will reside in memory during the training process or will be dynamically fetched from disk (useful for large datasets). In the latter case a training batch of input images will be fetched from disk each training iteration.num_processes
(default 1): specifies the number of processes to run for preprocessing images.resize_method
(defaultcrop_or_pad
): available options:crop_or_pad
- crops images larger than the specifiedwidth
andheight
to the desired size or pads smalled images using edge padding;interpolate
- uses interpolation to resize images to the specifiedwidth
andheight
.height
(defaultnull
): image height in pixels, must be set if resizing is requiredwidth
(defaultnull
): image width in pixels, must be set if resizing is requirednum_channels
(defaultnull
): number of channels in the images. By default, if the value isnull
, the number of channels of the first image of the dataset will be used and if there is an image in the dataset with a different number of channels, an error will be reported. If the value specified is notnull
, images in the dataset will be adapted to the specified size. If the value is1
, all images with more then one channel will be greyscaled and reduced to one channel (trasparecy will be lost). If the value is3
all images with 1 channel will be repeated 3 times to obtain 3 channels, while images with 4 channels will lose the transparecy channel. If the value is4
, all the images with less than 4 channels will have the remaining channels filled with zeros.scaling
(defaultpixel_normalization
): what scaling to perform on images. By defaultpixel_normalization
is performed, which consists in dividing each pixel values by 255, butpixel_standardization
is also available, whic uses TensorFlow's per image standardization.
Depending on the application, it is preferrable not to exceed a size of 256 x 256
, as bigger sizes will, in most cases, not provide much advantage in terms of performance, while they will considerably slow down training and inference and also make both forward and backward passes consume considerably more memory, leading to memory overflows on machines with limited amounts of RAM or on GPUs with limited amounts of VRAM.
Example of a preprocessing specification:
name: image_feature_name
type: image
preprocessing:
height: 128
width: 128
resize_method: interpolate
scaling: pixel_normalization
Image Input Features and Encoders¶
Input image features are transformed into a float valued tensors of size N x H x W x C
(where N
is the size of the dataset and H x W
is a specific resizing of the image that can be set, and C
is the number of channels) and added to HDF5 with a key that reflects the name of column in the dataset.
The column name is added to the JSON file, with an associated dictionary containing preprocessing information about the sizes of the resizing.
Currently there are two encoders supported for images: Convolutional Stack Encoder and ResNet encoder which can be set by setting encoder
parameter to stacked_cnn
or resnet
in the input feature dictionary in the configuration (stacked_cnn
is the default one).
Convolutional Stack Encoder¶
Convolutional Stack Encoder takes the following optional parameters:
conv_layers
(defaultnull
): it is a list of dictionaries containing the parameters of all the convolutional layers. The length of the list determines the number of stacked convolutional layers and the content of each dictionary determines the parameters for a specific layer. The available parameters for each layer are:filter_size
,num_filters
,pool_size
,norm
,activation
andregularize
. If any of those values is missing from the dictionary, the default one specified as a parameter of the encoder will be used instead. If bothconv_layers
andnum_conv_layers
arenull
, a default list will be assigned toconv_layers
with the value[{filter_size: 7, pool_size: 3, regularize: false}, {filter_size: 7, pool_size: 3, regularize: false}, {filter_size: 3, pool_size: null, regularize: false}, {filter_size: 3, pool_size: null, regularize: false}, {filter_size: 3, pool_size: null, regularize: true}, {filter_size: 3, pool_size: 3, regularize: true}]
.num_conv_layers
(defaultnull
): ifconv_layers
isnull
, this is the number of stacked convolutional layers.filter_size
(default3
): if afilter_size
is not already specified inconv_layers
this is the defaultfilter_size
that will be used for each layer. It indicates how wide is the 1d convolutional filter.num_filters
(default256
): if anum_filters
is not already specified inconv_layers
this is the defaultnum_filters
that will be used for each layer. It indicates the number of filters, and by consequence the output channels of the 2d convolution.strides
(default(1, 1)
): specifying the strides of the convolution along the height and widthpadding
(defaultvalid
): one ofvalid
orsame
.dilation_rate
(default(1, 1)
): specifying the dilation rate to use for dilated convolution.conv_use_bias
(defaulttrue
): boolean, whether the layer uses a bias vector.conv_weights_initializer
(default'glorot_uniform'
): initializer for the weights matrix. Options are:constant
,identity
,zeros
,ones
,orthogonal
,normal
,uniform
,truncated_normal
,variance_scaling
,glorot_normal
,glorot_uniform
,xavier_normal
,xavier_uniform
,he_normal
,he_uniform
,lecun_normal
,lecun_uniform
. Alternatively it is possible to specify a dictionary with a keytype
that identifies the type of initializer and other keys for its parameters, e.g.{type: normal, mean: 0, stddev: 0}
. To know the parameters of each initializer, please refer to TensorFlow's documentation.conv_bias_initializer
(default'zeros'
): initializer for the bias vector. Options are:constant
,identity
,zeros
,ones
,orthogonal
,normal
,uniform
,truncated_normal
,variance_scaling
,glorot_normal
,glorot_uniform
,xavier_normal
,xavier_uniform
,he_normal
,he_uniform
,lecun_normal
,lecun_uniform
. Alternatively it is possible to specify a dictionary with a keytype
that identifies the type of initializer and other keys for its parameters, e.g.{type: normal, mean: 0, stddev: 0}
. To know the parameters of each initializer, please refer to TensorFlow's documentation.weights_regularizer
(defaultnull
): regularizer function applied to the weights matrix. Valid values arel1
,l2
orl1_l2
.conv_bias_regularizer
(defaultnull
): regularizer function applied to the bias vector. Valid values arel1
,l2
orl1_l2
.conv_activity_regularizer
(defaultnull
): regurlizer function applied to the output of the layer. Valid values arel1
,l2
orl1_l2
.conv_norm
(defaultnull
): if anorm
is not already specified infc_layers
this is the defaultnorm
that will be used for each layer. It indicates the norm of the output and it can benull
,batch
orlayer
.conv_norm_params
(defaultnull
): parameters used ifnorm
is eitherbatch
orlayer
. For information on parameters used withbatch
see Tensorflow's documentation on batch normalization or forlayer
see Tensorflow's documentation on layer normalization.conv_activation
(defaultrelu
): if anactivation
is not already specified infc_layers
this is the defaultactivation
that will be used for each layer. It indicates the activation function applied to the output.conv_dropout
(default0
): dropout ratepool_function
(defaultmax
): pooling function:max
will select the maximum value. Any of these--average
,avg
ormean
--will compute the mean value.pool_size
(default(2, 2)
): if apool_size
is not already specified inconv_layers
this is the defaultpool_size
that will be used for each layer. It indicates the size of the max pooling that will be performed along thes
sequence dimension after the convolution operation.pool_strides
(defaultnull
): factor to scale downfc_layers
(defaultnull
): it is a list of dictionaries containing the parameters of all the fully connected layers. The length of the list determines the number of stacked fully connected layers and the content of each dictionary determines the parameters for a specific layer. The available parameters for each layer are:fc_size
,norm
,activation
andregularize
. If any of those values is missing from the dictionary, the default one specified as a parameter of the encoder will be used instead. If bothfc_layers
andnum_fc_layers
arenull
, a default list will be assigned tofc_layers
with the value[{fc_size: 512}, {fc_size: 256}]
(only applies ifreduce_output
is notnull
).num_fc_layers
(default1
): This is the number of stacked fully connected layers.fc_size
(default256
): if afc_size
is not already specified infc_layers
this is the defaultfc_size
that will be used for each layer. It indicates the size of the output of a fully connected layer.fc_use_bias
(defaulttrue
): boolean, whether the layer uses a bias vector.fc_weights_initializer
(default'glorot_uniform'
): initializer for the weights matrix. Options are:constant
,identity
,zeros
,ones
,orthogonal
,normal
,uniform
,truncated_normal
,variance_scaling
,glorot_normal
,glorot_uniform
,xavier_normal
,xavier_uniform
,he_normal
,he_uniform
,lecun_normal
,lecun_uniform
. Alternatively it is possible to specify a dictionary with a keytype
that identifies the type of initializer and other keys for its parameters, e.g.{type: normal, mean: 0, stddev: 0}
. To know the parameters of each initializer, please refer to TensorFlow's documentation.fc_bias_initializer
(default'zeros'
): initializer for the bias vector. Options are:constant
,identity
,zeros
,ones
,orthogonal
,normal
,uniform
,truncated_normal
,variance_scaling
,glorot_normal
,glorot_uniform
,xavier_normal
,xavier_uniform
,he_normal
,he_uniform
,lecun_normal
,lecun_uniform
. Alternatively it is possible to specify a dictionary with a keytype
that identifies the type of initializer and other keys for its parameters, e.g.{type: normal, mean: 0, stddev: 0}
. To know the parameters of each initializer, please refer to TensorFlow's documentation.fc_weights_regularizer
(defaultnull
): regularizer function applied to the weights matrix. Valid values arel1
,l2
orl1_l2
.fc_bias_regularizer
(defaultnull
): regularizer function applied to the bias vector. Valid values arel1
,l2
orl1_l2
.fc_activity_regularizer
(defaultnull
): regurlizer function applied to the output of the layer. Valid values arel1
,l2
orl1_l2
.fc_norm
(defaultnull
): if anorm
is not already specified infc_layers
this is the defaultnorm
that will be used for each layer. It indicates the norm of the output and it can benull
,batch
orlayer
.fc_norm_params
(defaultnull
): parameters used ifnorm
is eitherbatch
orlayer
. For information on parameters used withbatch
see Tensorflow's documentation on batch normalization or forlayer
see Tensorflow's documentation on layer normalization.fc_activation
(defaultrelu
): if anactivation
is not already specified infc_layers
this is the defaultactivation
that will be used for each layer. It indicates the activation function applied to the output.fc_dropout
(default0
): dropout rate
Example image feature entry using a convolutional stack encoder (with default parameters) in the input features list:
name: image_column_name
type: image
encoder: stacked_cnn
tied_weights: null
conv_layers: null
num_conv_layers: null
filter_size: 3
num_filters: 256
strides: (1, 1)
padding: valid
dilation_rate: (1, 1)
conv_use_bias: true
conv_weights_initializer: glorot_uniform
conv_bias_initializer: zeros
weights_regularizer: null
conv_bias_regularizer: null
conv_activity_regularizer: null
conv_norm: null
conv_norm_params: null
conv_activation: relu
conv_dropout: 0
pool_function: max
pool_size: (2, 2)
pool_strides: null
fc_layers: null
num_fc_layers: 1
fc_size: 256
fc_use_bias: true
fc_weights_initializer: glorot_uniform
fc_bias_initializer: zeros
fc_weights_regularizer: null
fc_bias_regularizer: null
fc_activity_regularizer: null
fc_norm: null
fc_norm_params: null
fc_activation: relu
fc_dropout: 0
preprocessing: # example pre-processing
height: 28
width: 28
num_channels: 1
ResNet Encoder¶
ResNet Encoder takes the following optional parameters:
resnet_size
(default50
): A single integer for the size of the ResNet model. If has to be one of the following values:8
,14
,18
,34
,50
,101
,152
,200
.num_filters
(default16
): It indicates the number of filters, and by consequence the output channels of the 2d convolution.kernel_size
(default3
): The kernel size to use for convolution.conv_stride
(default1
): Stride size for the initial convolutional layer.first_pool_size
(defaultnull
): Pool size to be used for the first pooling layer. If none, the first pooling layer is skipped.batch_norm_momentum
(default0.9
): Momentum of the batch norm running statistics. The suggested parameter in TensorFlow's implementation is0.997
, but that leads to a big discrepancy between the normalization at training time and test time, so the default value is a more conservative0.9
.batch_norm_epsilon
(default0.001
): Epsilon of the batch norm. The suggested parameter in TensorFlow's implementation is1e-5
, but that leads to a big discrepancy between the normalization at training time and test time, so the default value is a more conservative0.001
.fc_layers
(defaultnull
): it is a list of dictionaries containing the parameters of all the fully connected layers. The length of the list determines the number of stacked fully connected layers and the content of each dictionary determines the parameters for a specific layer. The available parameters for each layer are:fc_size
,norm
,activation
andregularize
. If any of those values is missing from the dictionary, the default one specified as a parameter of the encoder will be used instead. If bothfc_layers
andnum_fc_layers
arenull
, a default list will be assigned tofc_layers
with the value[{fc_size: 512}, {fc_size: 256}]
(only applies ifreduce_output
is notnull
).num_fc_layers
(default1
): This is the number of stacked fully connected layers.fc_size
(default256
): if afc_size
is not already specified infc_layers
this is the defaultfc_size
that will be used for each layer. It indicates the size of the output of a fully connected layer.use_bias
(defaulttrue
): boolean, whether the layer uses a bias vector.weights_initializer
(default'glorot_uniform'
): initializer for the weights matrix. Options are:constant
,identity
,zeros
,ones
,orthogonal
,normal
,uniform
,truncated_normal
,variance_scaling
,glorot_normal
,glorot_uniform
,xavier_normal
,xavier_uniform
,he_normal
,he_uniform
,lecun_normal
,lecun_uniform
. Alternatively it is possible to specify a dictionary with a keytype
that identifies the type of initializer and other keys for its parameters, e.g.{type: normal, mean: 0, stddev: 0}
. To know the parameters of each initializer, please refer to TensorFlow's documentation.bias_initializer
(default'zeros'
): initializer for the bias vector. Options are:constant
,identity
,zeros
,ones
,orthogonal
,normal
,uniform
,truncated_normal
,variance_scaling
,glorot_normal
,glorot_uniform
,xavier_normal
,xavier_uniform
,he_normal
,he_uniform
,lecun_normal
,lecun_uniform
. Alternatively it is possible to specify a dictionary with a keytype
that identifies the type of initializer and other keys for its parameters, e.g.{type: normal, mean: 0, stddev: 0}
. To know the parameters of each initializer, please refer to TensorFlow's documentation.weights_regularizer
(defaultnull
): regularizer function applied to the weights matrix. Valid values arel1
,l2
orl1_l2
.bias_regularizer
(defaultnull
): regularizer function applied to the bias vector. Valid values arel1
,l2
orl1_l2
.activity_regularizer
(defaultnull
): regurlizer function applied to the output of the layer. Valid values arel1
,l2
orl1_l2
.norm
(defaultnull
): if anorm
is not already specified infc_layers
this is the defaultnorm
that will be used for each layer. It indicates the norm of the output and it can benull
,batch
orlayer
.norm_params
(defaultnull
): parameters used ifnorm
is eitherbatch
orlayer
. For information on parameters used withbatch
see Tensorflow's documentation on batch normalization or forlayer
see Tensorflow's documentation on layer normalization.activation
(defaultrelu
): if anactivation
is not already specified infc_layers
this is the defaultactivation
that will be used for each layer. It indicates the activation function applied to the output.dropout
(default0
): dropout rate
Example image feature entry using a ResNet encoder (with default parameters) in the input features list:
name: image_column_name
type: image
encoder: resnet
tied_weights: null
resnet_size: 50
num_filters: 16
kernel_size: 3
conv_stride: 1
first_pool_size: null
batch_norm_momentum: 0.9
batch_norm_epsilon: 0.001
fc_layers: null
num_fc_layers: 1
fc_size: 256
use_bias: true
weights_initializer: glorot_uniform
bias_initializer: zeros
weights_regularizer: null
bias_regularizer: null
activity_regularizer: null
norm: null
norm_params: null
activation: relu
dropout: 0
preprocessing:
height: 224
width: 224
num_channels: 3
Image Output Features and Decoders¶
There are no image decoders at the moment (WIP), so image cannot be used as output features.
Image Features Measures¶
As no image decoders are available at the moment, there are also no image measures.