Hyperopt
The hyperopt section of the Ludwig configuration defines what metrics to optimize for, which parameters to optimize,
search strategy and execution strategy.
hyperopt:
goal: minimize
output_feature: combined
metric: loss
split: validation
parameters:
title.encoder.cell_type: ... # title is a text feature type
title.encoder.num_layers: ...
combiner.num_fc_layers: ...
section.encoder.embedding_size: ...
preprocessing.text.vocab_size: ...
trainer.learning_rate: ...
trainer.optimizer.type: ...
...
search_alg:
type: variant_generator # random, hyperopt, bohb, ...
# search_alg parameters...
executor:
type: ray
num_samples: ...
scheduler:
type: fifo # hb_bohb, asynchyperband, ...
# scheduler parameters...
Hyperopt configuration parameters¶
goalwhich indicates if to minimize or maximize a metric or a loss of any of the output features on any of the dataset splits. Available values are:minimize(default) ormaximize.output_featureis astrcontaining the name of the output feature that we want to optimize the metric or loss of. Available values arecombined(default) or the name of any output feature provided in the configuration.combinedis a special output feature that allows to optimize for the aggregated loss and metrics of all output features.metricis the metric that we want to optimize for. The default one isloss, but depending on the type of the feature defined inoutput_feature, different metrics and losses are available. Check the metrics section of the specific output feature type to figure out what metrics are available to use.splitis the split of data that we want to compute our metric on. By default it is thevalidationsplit, but you have the flexibility to specify alsotrainortestsplits.parameterssection consists of a set of hyperparameters to optimize. They are provided as keys (the names of the parameters) and values associated with them (that define the search space). The values vary depending on the type of the hyperparameter. Syntax for this section is based on Ray Tune's Search Space parameters.search_algsection specifies the algorithm to sample the definedparametersspace. Candidate algorithms are those found in Ray Tune's Search Algorithms.executorsection specifies how to execute the hyperparameter optimization. The execution could happen locally in a serial manner or in parallel across multiple workers and with GPUs as well if available. Theexecutorsection includes specification for work scheduling and the number of samples to generate.
Defining hyperparameter search spaces¶
In the parameters section, hyperparameters are dot( .) separate names. The parts of the hyperparameter names separated by . are references to nested sections in the Ludwig configuration.
For instance, to reference the learning_rate, in the trainer section one would use the name trainer.learning_rate.
If the parameter to reference is inside an input or output feature, the name of that feature will be used as starting point.
For instance, for referencing the cell_type of the encoder for the title feature, use the name title.encoder.cell_type.
Numeric Hyperparameters¶
space: Use Ray Tune's Search Space types, e.g.,uniform,quniform,loguniform,choice, etc. Refer the cited page for details.
For numeric spaces, these define the range where the value is generated
lower: the minimum value the parameter can haveupper: the maximum value the parameter can haveq: quantization number, used inspacessuch asquniform,qloguniform,qrandn,qrandint,qlograndintbase: defines the base of the log forloguniform,qloguniform,lograndintandqlograndint
Note
Depending on the specific numeric space, the upper parameter may be inclusive or excluse. Refer to the Ray Tune documentation for the specific distribution for details.
Float example: Uniform floating point random values (in log space) between 0.001 and 0.1
trainer.learning_rate:
space: loguniform
lower: 0.001
upper: 0.1
Integer example: Uniform random integer values 1, 2, 3
combiner.num_fc_layers:
space: randint
lower: 1
upper: 4
Quantized Example: Uniform random floating point values such a 0, 0.1, 0.2, ..., 0.9
my_output_feature.decoder.dropout:
space: quniform
lower: 0
upper: 1
q: 0.1
Categorical Hyperparameters¶
space: Usechoice.categories: a list of possible values. The type of each value of the list is general, i.e., they could be strings, integers, floats and anything else, even entire dictionaries. The values will be a uniform random selection.
Example:
title.encoder.cell_type:
space: choice
categories: [rnn, gru, lstm]
Hyperparameters in a Grid¶
For space: grid_search
values: is a list of values to use in creating the grid search space. The type of each value of the list is general, i.e., they could be strings, integers, floats and anything else, even entire dictionaries.
Example:
title.encoder.cell_type:
space: grid_search
values: [rnn, gru, lstm]
More comprehensive example¶
hyperopt:
parameters:
trainer.learning_rate:
space: loguniform
lower: 0.001
upper: 0.1
combiner.num_fc_layers:
space: randint
lower: 2
upper: 6
title.encoder.cell_type:
space: grid_search
values: ["rnn", "gru"]
title.encoder.bidirectional:
space: choice
categories: [True, False]
title.encoder.fc_layers:
space: choice
categories:
- [{"output_size": 512}, {"output_size": 256}]
- [{"output_size": 512}]
- [{"output_size": 256}]
Default Hyperopt Parameters¶
In addition to defining hyperopt parameters for individual input or output features (like the title feature
in the example above), default parameters can be specified for entire feature types (for example, the encoder
to use for all text features in your dataset). Read more about default hyperopt parameters here.
Nested Ludwig Config Parameters¶
Ludwig also allows partial or full Ludwig configs to be sampled from the hyperopt search space. Read more about nested Ludwig config parameters here.
Search Algorithm¶
Ray Tune supports its own collection of search algorithms, specified by the search_alg section of the hyperopt config:
search_alg:
type: variant_generator
You can find the full list of supported search algorithm names in Ray Tune's create_searcher function. Please note these algorithms require installation of additional packages. As of this version of Ludwig, Ludwig installs the packages for the search algorithm hyperopt. For all other search algorithms, the user is expected to install the required packages.
Executor¶
Ray Tune Executor¶
The ray executor is used to enable Ray Tune for both local and distributed hyperopt across a cluster of machines.
Parameters:
-
num_samples: This parameter, along with thespacespecifications in theparameterssection, controls how many trials are generated (default: 1).Note
- If all the hyperparameters in the
parameterssection have non-grid_searchspecifications (e.g.,uniform,randn,choice, etc.), then the number of trials will benum_samples. - If all the hyperparameters have
grid_search, then the number of trials will be the product of the number of values specified for each hyperparameter. In this case,num_samplesshould be set to 1. For example, if there are threegrid_searchhyperparameters, with 2, 4 and 4 values, respectively. The number of trials will be 2 X 4 X 4 = 32, where each trial is a unique combination of the threegrid_searchhyperparameter values. - If there is a mixture of
grid_searchand non-grid_searchspaces, the number of trials will be product of the number of values specified for eachgrid_searchhyperpameter multiplied by the value ofnum_samples. To illustrate this point, we take the threegrid_searchhyperparameters described in the preceding bullet item and add 2 hyperparameters withuniformandrandintspaces. Withnum_samples = 10, for each unique combination of values from thegrid_searchhyperparameters, 10 trials will be generated with random values selected for theuniformandrandinthyperparameters. This will lead to a total of 32 X 10 = 320 trials.
- If all the hyperparameters in the
-
cpu_resources_per_trial: The number of CPU cores allocated to each trial (default: 1). gpu_resources_per_trial: The number of GPU devices allocated to each trial (default: 0).kubernetes_namespace: When running on Kubernetes, provide the namespace of the Ray cluster to sync results between pods. See the Ray docs for more info.time_budget_s: The number of seconds for the entire hyperopt run.-
max_concurrent_trials: The maximum number of trials to train concurrently. Defaults toautoif not specified.Note
- If you're using a Ray backend,
autowill infer the max_concurrent_trials that can be set given your cluster configuration to prevent trials from stalling. - If you're using a Local or Horovod backend,
autowill set max_concurrent_trials to None.
- If you're using a Ray backend,
Scheduler¶
Ray Tune also allows you to specify a scheduler to support features like early stopping and other population-based strategies that may pause and resume trials during trainer. Ludwig exposes the complete scheduler API in the scheduler section of the executor config.
You can find the full list of supported schedulers in Ray Tune's create_scheduler function.
Example:
executor:
type: ray
cpu_resources_per_trial: 2
gpu_resources_per_trial: 1
kubernetes_namespace: ray
time_budget_s: 7200
scheduler:
type: async_hyperband
time_attr: training_iteration
reduction_factor: 4
Running Ray Executor:
See the section on Running Ludwig with Ray for guidance on setting up your Ray cluster.
Full hyperparameter optimization example¶
Following is a full example of a Ludwig configuration with hyperparameter optimization.
Example YAML:
input_features:
-
name: title
type: text
encoder:
type: rnn
cell_type: lstm
num_layers: 2
combiner:
type: concat
num_fc_layers: 1
output_features:
-
name: class
type: category
defaults:
text:
preprocessing:
word_vocab_size: 10000
trainer:
learning_rate: 0.001
optimizer:
type: adam
hyperopt:
goal: maximize
output_feature: class
metric: accuracy
split: validation
parameters:
trainer.learning_rate:
space: loguniform
lower: 0.0001
upper: 0.1
trainer.optimizer.type:
space: choice
categories: [sgd, adam, adagrad]
preprocessing.text.word_vocab_size:
space: qrandint
lower: 700
upper: 1200
q: 5
combiner.num_fc_layers:
space: randint
lower: 1
upper: 5
title.encoder.cell_type:
space: choice
values: [rnn, gru, lstm]
search_alg:
type: random
executor:
type: ray
num_samples: 12
Example CLI command:
ludwig hyperopt --dataset reuters-allcats.csv --config_str "{input_features: [{name: title, type: text, encoder: {type: rnn, cell_type: lstm, num_layers: 2}}], output_features: [{name: class, type: category}], trainer: {learning_rate: 0.001}, hyperopt: {goal: maximize, output_feature: class, metric: accuracy, split: validation, parameters: {trainer.learning_rate: {space: loguniform, lower: 0.0001, upper: 0.1}, title.encoder.cell_type: {space: choice, categories: [rnn, gru, lstm]}}, search_alg: {type: variant_generator},executor: {type: ray, num_samples: 10}}}"