Add a Hyperopt Algorithm
The hyperparameter optimization design in Ludwig is based on two abstract
interfaces: HyperoptSampler
and HyperoptExecutor
.
See Hyperopt configuration for examples of how the sampler and executor are configured.
HyperoptSampler¶
HyperoptSampler
dictates how to sample hyperparameters values.
The sampler is configured by sampler
section of the hyperopt
section of the
Ludwig configuration.
Each hyperparameter that should be sampled is declared in hyperopt.parameters
,
which also specifies additional constraints that the Sampler
should honor. For
example:
hyperopt:
goal: minimize
output_feature: combined
metric: loss
split: validation
parameters:
trainer.learning_rate:
space: linear
range:
low: 0.001
high: 0.1
steps: 4
text.fc_layers:
space: choice
categories:
- [{"output_size": 512}, {"output_size": 256}]
- [{"output_size": 512}]
- [{"output_size": 256}]
Here, trainer.learning_rate
is sampled in continuously while text.fc_layers
is sampled discretely.
Note
Different HyperoptSampler
s are described here.
HyperoptExecutor¶
HyperoptExecutor
dictates how to execute the hyperparameter optimization,
which operates independently of how hyperparameters are actually sampled.
A HyperoptExecutor
uses a HyperoptSampler
to sample hyperparameters values,
usually initializes an execution context, like a multithread pool for instance,
and executes the hyperparameter optimization according to the sampler.
First, a new batch of parameters values is sampled from the HyperoptSampler
.
Then, sampled parameters values are merged with the seed Ludwig configuration,
with the sampled parameters values overriding the seed's.
Training is executed and validation losses and metrics are collected.
A (sampled_parameters, statistics)
pair is provided to the
HyperoptSampler.update
function to inform the next sample of hyperparameters.
The loop is repeated until all the samples are sampled.
Finally, HyperoptExecutor.execute
returns a list of dictionaries that each
contain: the sampled parameters, metric scores, and other training, validation,
and test statistics.
The returned list is printed and saved to disk, so that it can also be used as input to hyperparameter optimization visualizations.
Note
Different HyperoptExecutor
s are described here
Adding a HyperoptSampler¶
1. Add a new sampler class¶
The source code for the base HyperoptSampler
class is in
ludwig/hyperopt/sampling.py
.
Classes extending the base class should be defined in this file.
__init__
¶
def __init__(self, goal: str, parameters: Dict[str, Any]):
The parameters of the base HyperoptStrategy
class constructor are:
goal
which indicates if to minimize or maximize a metric or a loss of any of the output features on any of the splits which is defined in thehyperopt
sectionparameters
which contains all hyperparameters to optimize with their types and ranges / values.
Example:
goal = "minimize"
parameters = {
"training.learning_rate": {
"type": "float",
"low": 0.001,
"high": 0.1,
"steps": 4,
"scale": "linear"
},
"combiner.num_fc_layers": {
"type": "int",
"low": 2,
"high": 6,
"steps": 3
}
}
sampler = GridSampler(goal, parameters)
sample
¶
def sample(self) -> Dict[str, Any]:
sample
is a method that yields a new sample according to the sampler.
It returns a set of parameters names and their values.
If finished()
returns True
, calling sample
would return a IndexError
.
Example returned value:
{'training.learning_rate': 0.005, 'combiner.num_fc_layers': 2, 'utterance.cell_type': 'gru'}
sample_batch
¶
def sample_batch(self, batch_size: int = 1) -> List[Dict[str, Any]]:
sample_batch
method returns a list of sampled parameters of length equal to or less than batch_size
.
If finished()
returns True
, calling sample_batch
would return a IndexError
.
Example returned value:
[{'training.learning_rate': 0.005, 'combiner.num_fc_layers': 2, 'utterance.cell_type': 'gru'}, {'training.learning_rate': 0.015, 'combiner.num_fc_layers': 3, 'utterance.cell_type': 'lstm'}]
update
¶
def update(
self,
sampled_parameters: Dict[str, Any],
metric_score: float
):
update
updates the sampler with the results of previous computation.
sampled_parameters
is a dictionary of sampled parameters.metric_score
is the value of the optimization metric obtained for the specified sample.
It is not needed for stateless strategies like grid and random, but is needed for stateful strategies like bayesian and evolutionary ones.
Example:
sampled_parameters = {
'training.learning_rate': 0.005,
'combiner.num_fc_layers': 2,
'utterance.cell_type': 'gru'
}
metric_score = 2.53463
sampler.update(sampled_parameters, metric_score)
update_batch
¶
def update_batch(
self,
parameters_metric_tuples: Iterable[Tuple[Dict[str, Any], float]]
):
update_batch
updates the sampler with the results of previous computation in batch.
parameters_metric_tuples
a list of pairs of sampled parameters and their respective metric value.
It is not needed for stateless strategies like grid and random, but is needed for stateful strategies like bayesian and evolutionary ones.
Example:
sampled_parameters = [
{
'training.learning_rate': 0.005,
'combiner.num_fc_layers': 2,
'utterance.cell_type': 'gru'
},
{
'training.learning_rate': 0.015,
'combiner.num_fc_layers': 5,
'utterance.cell_type': 'lstm'
}
]
metric_scores = [2.53463, 1.63869]
sampler.update_batch(zip(sampled_parameters, metric_scores))
finished
¶
def finished(self) -> bool:
The finished
method return True
when all samples have been sampled, return False
otherwise.
2. Add the new sampler class to the corresponding sampler registry¶
The sampler_registry
contains a mapping between sampler
names in the
hyperopt
section of model definition and HyperoptSampler
sub-classes.
Add the new sampler to the registry:
sampler_registry = {
"random": RandomSampler,
"grid": GridSampler,
...,
"new_sampler_name": NewSamplerClass
}
Adding a HyperoptExecutor¶
1. Add a new executor class¶
The source code for the base HyperoptExecutor
class is in the
ludwig/utils/hyperopt_utils.py
module. Classes extending the base class should
be defined in the module.
__init__
¶
def __init__(
self,
hyperopt_sampler: HyperoptSampler,
output_feature: str,
metric: str,
split: str
)
The parameters of the base HyperoptExecutor
class constructor are
hyperopt_sampler
is aHyperoptSampler
object that will be used to sample hyperparameters valuesoutput_feature
is astr
containing the name of the output feature that we want to optimize the metric or loss of. Available values arecombined
(default) or the name of any output feature provided in the model definition.combined
is a special output feature that allows to optimize for the aggregated loss and metrics of all output features.metric
is the metric that we want to optimize for. The default one isloss
, but depending on the tye of the feature defined inoutput_feature
, different metrics and losses are available. Check the metrics section of the specific output feature type to figure out what metrics are available to use.split
is the split of data that we want to compute our metric on. By default it is thevalidation
split, but you have the flexibility to specifytrain
ortest
splits.
Example:
goal = "minimize"
parameters = {
"training.learning_rate": {
"type": "float",
"low": 0.001,
"high": 0.1,
"steps": 4,
"scale": "linear"
},
"combiner.num_fc_layers": {
"type": "int",
"low": 2,
"high": 6,
"steps": 3
}
}
output_feature = "combined"
metric = "loss"
split = "validation"
grid_sampler = GridSampler(goal, parameters)
executor = SerialExecutor(grid_sampler, output_feature, metric, split)
execute
¶
def execute(
self,
config,
dataset=None,
training_set=None,
validation_set=None,
test_set=None,
training_set_metadata=None,
data_format=None,
experiment_name="hyperopt",
model_name="run",
model_load_path=None,
model_resume_path=None,
skip_save_training_description=False,
skip_save_training_statistics=False,
skip_save_model=False,
skip_save_progress=False,
skip_save_log=False,
skip_save_processed_input=False,
skip_save_unprocessed_output=False,
skip_save_predictions=False,
skip_save_eval_stats=False,
output_directory="results",
gpus=None,
gpu_memory_limit=None,
allow_parallel_threads=True,
use_horovod=None,
random_seed=default_random_seed,
debug=False,
**kwargs
):
The execute
method executes the hyperparameter optimization.
It can leverage the run_experiment
function to obtain training and eval statistics and the self.get_metric_score
function to extract the metric score from the eval results according to self.output_feature
, self.metric
and self.split
.
2. Add the new executor class to the corresponding executor registry¶
The executor_registry
contains a mapping between executor
names in the hyperopt
section of model definition and HyperoptExecutor
sub-classes.
To make a new executor available, add it to the registry:
executor_registry = {
"serial": SerialExecutor,
"parallel": ParallelExecutor,
"fiber": FiberExecutor,
"new_executor_name": NewExecutorClass
}