Hyper-parameter Optimization
The hyper-parameter optimization design in Ludwig is based on two abstract interfaces: HyperoptSampler
and HyperoptExecutor
.
HyperoptSampler
represents the sampler adopted for sampling hyper-parameters values.
Which sampler to use is defined in the sampler
section of the model definition.
A Sampler
uses the parameters
defined in the hyperopt
section of the YAML model definition and a goal
, either to minimize or maximize.
Each sub-class of HyperoptSampler
that implements its abstract methods samples parameters according to their definition and type differently (see User Guide for details), like using a random search (implemented in RandomSampler
), or a grid serach (implemented in GridSampler
, or bayesian optimization or evolutionary techniques.
HyperoptExecutor
represents the method used to execute the hyper-parameter optimization, independently of how the values for the hyperparameters are sampled.
Available implementations are a serial executor that executes the training with the different sampled hyper-parameters values one at a time (implemented in SerialExecutor
), a parallel executor that runs the training using sampled hyper-parameters values in parallel on the same machine (implemented in the ParallelExecutor
), and a Fiber-based executor that enables to run the training using sampled hyper-parameters values in parallel on multiple machines within a cluster.
A HyperoptExecutor
uses a HyperoptSampler
to sample hyper-parameters values, usually initializes an execution context, like a multithread pool for instance, and executes the hyper-parameter optimization according to the sampler.
First, a new batch of parameters values is sampled from the HyperoptSampler
.
Then, sampled parameters values are merged with the basic model definition parameters specified, with the sampled parameters values overriding the ones in the basic model definition they refer to.
Training is executed using the merged model definition and training and validation losses and metrics are collected.
A (sampled_parameters, statistics)
pair is provided to the HyperoptSampler.update
function and the loop is repeated until all the samples are sampled.
At the end, HyperoptExecutor.execute
returns a list of dictionaries that include a parameter sample, its metric score, and its training and test statistics.
The returned list is printed and saved to disk, so that it can also be used as input to hyper-parameter optimization visualizations.
Adding a HyperoptSampler¶
1. Add a new sampler class¶
The source code for the base HyperoptSampler
class is in the ludwig/hyperopt/sampling.py
module.
Classes extending the base class should be defined in the same module.
__init__
¶
def __init__(self, goal: str, parameters: Dict[str, Any]):
The parameters of the base HyperoptStrategy
class constructor are:
goal
which indicates if to minimize or maximize a metric or a loss of any of the output features on any of the splits which is defined in thehyperopt
sectionparameters
which contains all hyper-parameters to optimize with their types and ranges / values.
Example:
goal = "minimize"
parameters = {
"training.learning_rate": {
"type": "float",
"low": 0.001,
"high": 0.1,
"steps": 4,
"scale": "linear"
},
"combiner.num_fc_layers": {
"type": "int",
"low": 2,
"high": 6,
"steps": 3
}
}
sampler = GridSampler(goal, parameters)
sample
¶
def sample(self) -> Dict[str, Any]:
sample
is a method that yields a new sample according to the sampler.
It returns a set of parameters names and their values.
If finished()
returns True
, calling sample
would return a IndexError
.
Example returned value:
{'training.learning_rate': 0.005, 'combiner.num_fc_layers': 2, 'utterance.cell_type': 'gru'}
sample_batch
¶
def sample_batch(self, batch_size: int = 1) -> List[Dict[str, Any]]:
sample_batch
method returns a list of sampled parameters of length equal to or less than batch_size
.
If finished()
returns True
, calling sample_batch
would return a IndexError
.
Example returned value:
[{'training.learning_rate': 0.005, 'combiner.num_fc_layers': 2, 'utterance.cell_type': 'gru'}, {'training.learning_rate': 0.015, 'combiner.num_fc_layers': 3, 'utterance.cell_type': 'lstm'}]
update
¶
def update(
self,
sampled_parameters: Dict[str, Any],
metric_score: float
):
update
updates the sampler with the results of previous computation.
sampled_parameters
is a dictionary of sampled parameters.metric_score
is the value of the optimization metric obtained for the specified sample.
It is not needed for stateless strategies like grid and random, but is needed for stateful strategies like bayesian and evolutionary ones.
Example:
sampled_parameters = {
'training.learning_rate': 0.005,
'combiner.num_fc_layers': 2,
'utterance.cell_type': 'gru'
}
metric_score = 2.53463
sampler.update(sampled_parameters, metric_score)
update_batch
¶
def update_batch(
self,
parameters_metric_tuples: Iterable[Tuple[Dict[str, Any], float]]
):
update_batch
updates the sampler with the results of previous computation in batch.
parameters_metric_tuples
a list of pairs of sampled parameters and their respective metric value.
It is not needed for stateless strategies like grid and random, but is needed for stateful strategies like bayesian and evolutionary ones.
Example:
sampled_parameters = [
{
'training.learning_rate': 0.005,
'combiner.num_fc_layers': 2,
'utterance.cell_type': 'gru'
},
{
'training.learning_rate': 0.015,
'combiner.num_fc_layers': 5,
'utterance.cell_type': 'lstm'
}
]
metric_scores = [2.53463, 1.63869]
sampler.update_batch(zip(sampled_parameters, metric_scores))
finished
¶
def finished(self) -> bool:
The finished
method return True
when all samples have been sampled, return False
otherwise.
2. Add the new sampler class to the corresponding sampler registry¶
The sampler_registry
contains a mapping between sampler
names in the hyperopt
section of model definition and HyperoptSampler
sub-classes.
To make a new sampler available, add it to the registry:
sampler_registry = {
"random": RandomSampler,
"grid": GridSampler,
...,
"new_sampler_name": NewSamplerClass
}
Adding a HyperoptExecutor¶
1. Add a new executor class¶
The source code for the base HyperoptExecutor
class is in the ludwig/utils/hyperopt_utils.py
module.
Classes extending the base class should be defined in the module.
__init__
¶
def __init__(
self,
hyperopt_sampler: HyperoptSampler,
output_feature: str,
metric: str,
split: str
)
The parameters of the base HyperoptExecutor
class constructor are
hyperopt_sampler
is aHyperoptSampler
object that will be used to sample hyper-parameters valuesoutput_feature
is astr
containing the name of the output feature that we want to optimize the metric or loss of. Available values arecombined
(default) or the name of any output feature provided in the model definition.combined
is a special output feature that allows to optimize for the aggregated loss and metrics of all output features.metric
is the metric that we want to optimize for. The default one isloss
, but depending on the tye of the feature defined inoutput_feature
, different metrics and losses are available. Check the metrics section of the specific output feature type to figure out what metrics are available to use.split
is the split of data that we want to compute our metric on. By default it is thevalidation
split, but you have the flexibility to specify alsotrain
ortest
splits.
Example:
goal = "minimize"
parameters = {
"training.learning_rate": {
"type": "float",
"low": 0.001,
"high": 0.1,
"steps": 4,
"scale": "linear"
},
"combiner.num_fc_layers": {
"type": "int",
"low": 2,
"high": 6,
"steps": 3
}
}
output_feature = "combined"
metric = "loss"
split = "validation"
grid_sampler = GridSampler(goal, parameters)
executor = SerialExecutor(grid_sampler, output_feature, metric, split)
execute
¶
def execute(
self,
config,
dataset=None,
training_set=None,
validation_set=None,
test_set=None,
training_set_metadata=None,
data_format=None,
experiment_name="hyperopt",
model_name="run",
model_load_path=None,
model_resume_path=None,
skip_save_training_description=False,
skip_save_training_statistics=False,
skip_save_model=False,
skip_save_progress=False,
skip_save_log=False,
skip_save_processed_input=False,
skip_save_unprocessed_output=False,
skip_save_predictions=False,
skip_save_eval_stats=False,
output_directory="results",
gpus=None,
gpu_memory_limit=None,
allow_parallel_threads=True,
use_horovod=None,
random_seed=default_random_seed,
debug=False,
**kwargs
):
The execute
method executes the hyper-parameter optimization.
It can leverage the run_experiment
function to obtain training and eval statistics and the self.get_metric_score
function to extract the metric score from the eval results according to self.output_feature
, self.metric
and self.split
.
2. Add the new executor class to the corresponding executor registry¶
The executor_registry
contains a mapping between executor
names in the hyperopt
section of model definition and HyperoptExecutor
sub-classes.
To make a new executor available, add it to the registry:
executor_registry = {
"serial": SerialExecutor,
"parallel": ParallelExecutor,
"fiber": FiberExecutor,
"new_executor_name": NewExecutorClass
}