Synthetic Benchmarks¶
All experimental code are public available in here .
We use 9 Synthetic Benchmarks in HPOlib[5], you can find more synthetic function in here.
We compare three optimizers in this section:
HyperOpt’s TPE[1] optimizer
UltraOpt’s ETPE optimizer
Random optimizer as baseline
We can see UltraOpt’s ETPE optimizer better than HyperOpt’s TPE[1] optimizer in 5/9 times, tied it in 4 cases.
Tabular Benchmarks¶
All experimental code are public available in here .
Tabular Benchmarks[3] performed an exhaustive search for a large neural architecture search problem and compiled all architecture and performance pairs into a neural architecture search benchmark.
Tabular Benchmarks[3] use 4 popular UCI[4] datasets(see Table 1 for an overvie) for regression, and used a two layer feed forward neural network followed by a linear output layer on top. The configuration space (denoted in Table 2) only includes a modest number of 4 architectural choice (number of units and activation functions for both layers) and 5 hyperparameters (dropout rates per layer, batch size, initial learning rate and learning rate schedule) in order to allow for an exhaustive evaluation of all the 62 208 configurations resulting from discretizing the hyperparameters as in Table 2. Tabular Benchmarks encode numerical hyperparameters as ordinals and all other hyperparameters as categoricals.
Dataset | # training datapoints | # validation datapoints | # test datapoints | # features |
---|---|---|---|---|
HPO-Bench-Protein | 27 438 | 9 146 | 9 146 | 9 |
HPO-Bench-Slice | 32 100 | 10 700 | 10 700 | 385 |
HPO-Bench-Naval | 7 160 | 2 388 | 2 388 | 15 |
HPO-Bench-Parkinson | 3 525 | 1 175 | 1 175 | 20 |
Hyperparameters | Choices |
---|---|
Initial LR | {.0005, .001, .005, .01, .05, .1} |
Batch Size | {8, 16, 32, 64} |
LR Schedule | {cosine, fix} |
Activation/Layer 1 | {relu, tanh} |
Activation/Layer 2 | {relu, tanh} |
Layer 1 Size | {16, 32, 64, 128, 256, 512} |
Layer 2 Size | {16, 32, 64, 128, 256, 512} |
Dropout/Layer 1 | {0.0, 0.3, 0.6} |
Dropout/Layer 2 | {0.0, 0.3, 0.6} |
Based on the gathered data, we compare ours optimization package to other optimizers such as HyperOpt[1] and HpBandSter[2] in two scenario: (1) Full-Budget Evaluation Strategy (2) HyperBand Evaluation Strategy.
Tabular Benchmarks[3] is publicly available at here.
Full-Budget Evaluation Strategy¶
In this section we use the generated benchmarks to evaluate different HPO methods. To mimic the randomness that comes with evaluating a configuration, in each function evaluation Tabular Benchmarks randomly sample one of the four performance values. Tabular Benchmarks do not take the additional overhead of the optimizer into account since it is negligible compared to the training time of the neural network.
After each function evaluation we estimate the incumbent as the configuration with the lowest observed error and compute the regret between the incumbent and the globally best configuration in terms of test error. Each method that operates on the full budget of 100 epochs was allowed to perform 200 function evaluations (200 iterations). We performed 20 independent runs of each method and report the median and the 25th and 90th quantile.
We compare three optimizers in this section:
HyperOpt’s TPE optimizer
UltraOpt’s ETPE optimizer
Random optimizer as baseline
HyperOpt's TPE
optimizer’s experimental code uses this.
We can see UltraOpt’s ETPE optimizer better than HyperOpt’s TPE[1] optimizer in 3/4 times, tied it in one case.
HyperBand Evaluation Strategy¶
After evaluate optimizers in full-budget and compare performance over iterations, we want to know how much improvement HyperBand Evaluation Strategy[6] can bring and compares optimizers’ performance over time.
To obtain a realistic estimate of the wall-clock time required for each optimizer, we accumulated the stored runtime of each configuration the optimizer evaluated.
For BOHB[2] and HyperBand[6] we set the minimum budget to 3 epochs, the maximum budget to 100, $\eta$ to 3 and the number of successive halving iterations to 250.
You can view iterations table by entering following code in IPython:
In [1]: from ultraopt.multi_fidelity import HyperBandIterGenerator
In [2]: HyperBandIterGenerator(min_budget=3, max_budget=100, eta=3)
Out[2]:
iter 0 | iter 1 | iter 2 | iter 3 | |||||||
---|---|---|---|---|---|---|---|---|---|---|
stage 0 | stage 1 | stage 2 | stage 3 | stage 0 | stage 1 | stage 2 | stage 0 | stage 1 | stage 0 | |
num_config | 27 | 9 | 3 | 1 | 9 | 3 | 1 | 6 | 2 | 4 |
budget | 3.70 | 11.11 | 33.33 | 100.00 | 11.11 | 33.33 | 100.00 | 33.33 | 100.00 | 100.00 |
In addition to the three optimizers described above, three optimizers are added in this section:
HpBandster’s BOHB optimizer[2]
UltraOpt’s BOHB optimier
HyperBand as baseline
HpBandster's BOHB
optimizer’s experimental code uses this:
UltraOpt's BOHB
optimizer is implemented in following code:
iter_generator = HyperBandIterGenerator(min_budget=3, max_budget=100, eta=3)
fmin_result = fmin(objective_function, cs, optimizer="ETPE",
multi_fidelity_iter_generator=iter_generator)
HyperBand
optimizer is implemented in following code:
iter_generator = HyperBandIterGenerator(min_budget=3, max_budget=100, eta=3)
fmin_result = fmin(objective_function, cs, optimizer="Random",
multi_fidelity_iter_generator=iter_generator)
First, let’s draw some conclusions from Protein Structure
dataset’s benchmarks:
HyperBand achieved a reasonable performance relatively quickly but only slightly improves over simple Random Search eventually.
BOHB is in the beginning as good as HyperBand but starts outperforming it as soon as it obtains a meaningful model.
UltraOpt’s BOHB is better than HpBandSter’s BOHB .
Reference
[4] Lichman, M. (2013). UCI machine learning repository
[5] https://github.com/automl/HPOlib1.5/tree/development