Hyperopt
The hyperopt
section of the Ludwig configuration defines what metrics to optimize for, which parameters to optimize,
search strategy and execution strategy.
hyperopt:
goal: minimize
output_feature: combined
metric: loss
split: validation
parameters:
title.cell_type: ... # title is a text feature type
title.num_layers: ...
combiner.num_fc_layers: ...
section.embedding_size: ...
preprocessing.text.vocab_size: ...
trainer.learning_rate: ...
trainer.optimizer.type: ...
...
search_alg:
type: variant_generator # random, hyperopt, bohb, ...
# search_alg parameters...
executor:
type: ray
num_samples: ...
scheduler:
type: fifo # hb_bohb, asynchyperband, ...
# scheduler parameters...
Hyperopt configuration parameters¶
goal
which indicates if to minimize or maximize a metric or a loss of any of the output features on any of the dataset splits. Available values are:minimize
(default) ormaximize
.output_feature
is astr
containing the name of the output feature that we want to optimize the metric or loss of. Available values arecombined
(default) or the name of any output feature provided in the configuration.combined
is a special output feature that allows to optimize for the aggregated loss and metrics of all output features.metric
is the metric that we want to optimize for. The default one isloss
, but depending on the type of the feature defined inoutput_feature
, different metrics and losses are available. Check the metrics section of the specific output feature type to figure out what metrics are available to use.split
is the split of data that we want to compute our metric on. By default it is thevalidation
split, but you have the flexibility to specify alsotrain
ortest
splits.parameters
section consists of a set of hyperparameters to optimize. They are provided as keys (the names of the parameters) and values associated with them (that define the search space). The values vary depending on the type of the hyperparameter. Syntax for this section is based on Ray Tune's Search Space parameters.search_alg
section specifies the algorithm to sample the definedparameters
space. Candidate algorithms are those found in Ray Tune's Search Algorithms.executor
section specifies how to execute the hyperparameter optimization. The execution could happen locally in a serial manner or in parallel across multiple workers and with GPUs as well if available. Theexecutor
section includes specification for work scheduling and the number of samples to generate.
Defining hyperparameter search spaces¶
In the parameters
section, hyperparameters are dot( .
) separate names. The parts of the hyperparameter names separated by .
are references to nested sections in the Ludwig configuration.
For instance, to reference the learning_rate
, in the trainer
section one would use the name trainer.learning_rate
.
If the parameter to reference is inside an input or output feature, the name of that feature will be be used as starting point.
For instance, for referencing the cell_type
of the title
feature, use the name title.cell_type
.
Numeric Hyperparameters¶
space
: Use Ray Tune's Search Space types, e.g.,uniform
,quniform
,loguniform
,choice
, etc. Refer the cited page for details.
For numeric spaces
, these define the range where the value is generated
lower
: the minimum value the parameter can haveupper
: the maximum value the parameter can haveq
: quantization number, used inspaces
such asquniform
,qloguniform
,qrandn
,qrandint
,qlograndint
base
: defines the base of the log forloguniform
,qloguniform
,lograndint
andqlograndint
Note
Depending on the specific numeric space
, the upper
parameter may be inclusive or excluse. Refer to the Ray Tune documentation for the specific distribution for details.
Float example: Uniform floating point random values (in log space) between 0.001 and 0.1
trainer.learning_rate:
space: loguniform
lower: 0.001
upper: 0.1
Integer example: Uniform random integer values 1, 2, 3
combiner.num_fc_layers:
space: randint
lower: 1
upper: 4
Quantized Example: Uniform random floating point values such a 0, 0.1, 0.2, ..., 0.9
my_output_feature.dropout:
space: quniform
lower: 0
upper: 1
q: 0.1
Categorical Hyperparameters¶
space
: Usechoice
.categories
: a list of possible values. The type of each value of the list is general, i.e., they could be strings, integers, floats and anything else, even entire dictionaries. The values will be a uniform random selection.
Example:
title.cell_type:
space: choice
categories: [rnn, gru, lstm]
Hyperparameters in a Grid¶
For space
: grid_search
values
: is a list of values to use in creating the grid search space. The type of each value of the list is general, i.e., they could be strings, integers, floats and anything else, even entire dictionaries.
Example:
title.cell_type:
space: grid_search
values: [rnn, gru, lstm]
More comprehensive example¶
hyperopt:
parameters:
trainer.learning_rate:
space: loguniform
lower: 0.001
upper: 0.1
combiner.num_fc_layers:
space: randint
lower: 2
upper: 6
title.cell_type:
space: grid_search
values: ["rnn", "gru"]
title.bidirectional:
space: choice
categories: [True, False]
title.fc_layers:
space: choice
categories:
- [{"output_size": 512}, {"output_size": 256}]
- [{"output_size": 512}]
- [{"output_size": 256}]
Search Algorithm¶
Ray Tune supports its own collection of search algorithms, specified by the search_alg
section of the hyperopt config:
search_alg:
type: variant_generator
You can find the full list of supported search algorithm names in Ray Tune's create_searcher function. Please note these algorithms require installation of additional packages. As of this version of Ludwig, Ludwig installs the packages for the search algorithm hyperopt
. For all other search algorithms, the user is expected to install the required packages.
Executor¶
Ray Tune Executor¶
The ray
executor is used to enable Ray Tune for both local and distributed hyperopt across a cluster of machines.
Parameters:
num_samples
: This parameter, along with thespace
specifications in theparameters
section, controls how many trials are generated (default: 1).
Note
- If all the hyperparameters in the
parameters
section have non-grid_search
specifications (e.g.,uniform
,randn
,choice
, etc.), then the number of trials will benum_samples
. - If all the hyperparameters have
grid_search
, then the number of trials will be the product of the number of values specified for each hyperparameter. In this case,num_samples
should be set to 1. For example, if there are threegrid_search
hyperparameters, with 2, 4 and 4 values, respectively. The number of trials will be 2 X 4 X 4 = 32, where each trial is a unique combination of the threegrid_search
hyperparameter values. - If there is a mixture of
grid_search
and non-grid_search
spaces, the number of trials will be product of the number of values specified for eachgrid_search
hyperpameter multiplied by the value ofnum_samples
. To illustrate this point, we take the threegrid_search
hyperparameters described in the preceding bullet item and add 2 hyperparameters withuniform
andrandint
spaces. Withnum_samples = 10
, for each unique combination of values from thegrid_search
hyperparameters, 10 trials will be generated with random values selected for theuniform
andrandint
hyperparameters. This will lead to a total of 32 X 10 = 320 trials.
cpu_resources_per_trial
: The number of CPU cores allocated to each trial (default: 1).gpu_resources_per_trial
: The number of GPU devices allocated to each trial (default: 0).kubernetes_namespace
: When running on Kubernetes, provide the namespace of the Ray cluster to sync results between pods. See the Ray docs for more info.time_budget_s
: The number of seconds for the entire hyperopt run.
Scheduler¶
Ray Tune also allows you to specify a scheduler to support features like early stopping and other population-based strategies that may pause and resume trials during trainer. Ludwig exposes the complete scheduler API in the scheduler
section of the executor
config.
You can find the full list of supported schedulers in Ray Tune's create_scheduler function.
Example:
executor:
type: ray
cpu_resources_per_trial: 2
gpu_resources_per_trial: 1
kubernetes_namespace: ray
time_budget_s: 7200
scheduler:
type: async_hyperband
time_attr: training_iteration
reduction_factor: 4
Running Ray Executor:
See the section on Running Ludwig with Ray for guidance on setting up your Ray cluster.
Full hyperparameter optimization example¶
Following is a full example of a Ludwig configuration with hyperparameter optimization.
Example YAML:
input_features:
-
name: title
type: text
encoder: rnn
cell_type: lstm
num_layers: 2
combiner:
type: concat
num_fc_layers: 1
output_features:
-
name: class
type: category
preprocessing:
text:
word_vocab_size: 10000
training:
learning_rate: 0.001
optimizer:
type: adam
hyperopt:
goal: maximize
output_feature: class
metric: accuracy
split: validation
parameters:
trainer.learning_rate:
space: loguniform
lower: 0.0001
upper: 0.1
trainer.optimizer.type:
space: choice
categories: [sgd, adam, adagrad]
preprocessing.text.word_vocab_size:
space: qrandint
lower: 700
upper: 1200
q: 5
combiner.num_fc_layers:
space: randint
lower: 1
upper: 5
title.cell_type:
space: choice
values: [rnn, gru, lstm]
search_alg:
type: random
executor:
type: ray
num_samples: 12
Example CLI command:
ludwig hyperopt --dataset reuters-allcats.csv --config_str "{input_features: [{name: title, type: text, encoder: rnn, cell_type: lstm, num_layers: 2}], output_features: [{name: class, type: category}], training: {learning_rate: 0.001}, hyperopt: {goal: maximize, output_feature: class, metric: accuracy, split: validation, parameters: {trainer.learning_rate: {space: loguniform, lower: 0.0001, upper: 0.1}, title.cell_type: {space: choice, categories: [rnn, gru, lstm]}}, search_alg: {type: variant_generator},executor: {type: ray, num_samples: 10}}}"