Synthetic Benchmarks

All experimental code are public available in here .

We use 9 Synthetic Benchmarks in HPOlib[5], you can find more synthetic function in here.

We compare three optimizers in this section:

  1. HyperOpt’s TPE[1] optimizer

  2. UltraOpt’s ETPE optimizer

  3. Random optimizer as baseline

Figure 1: Performance over Iterations in Synthetic Benchmarks

synthetic_benchmarks_log.png

We can see UltraOpt’s ETPE optimizer better than HyperOpt’s TPE[1] optimizer in 5/9 times, tied it in 4 cases.

Tabular Benchmarks

All experimental code are public available in here .

Tabular Benchmarks[3] performed an exhaustive search for a large neural architecture search problem and compiled all architecture and performance pairs into a neural architecture search benchmark.

Tabular Benchmarks[3] use 4 popular UCI[4] datasets(see Table 1 for an overvie) for regression, and used a two layer feed forward neural network followed by a linear output layer on top. The configuration space (denoted in Table 2) only includes a modest number of 4 architectural choice (number of units and activation functions for both layers) and 5 hyperparameters (dropout rates per layer, batch size, initial learning rate and learning rate schedule) in order to allow for an exhaustive evaluation of all the 62 208 configurations resulting from discretizing the hyperparameters as in Table 2. Tabular Benchmarks encode numerical hyperparameters as ordinals and all other hyperparameters as categoricals.

Table 1: Dataset splits
Dataset # training datapoints # validation datapoints # test datapoints # features
HPO-Bench-Protein 27 438 9 146 9 146 9
HPO-Bench-Slice 32 100 10 700 10 700 385
HPO-Bench-Naval 7 160 2 388 2 388 15
HPO-Bench-Parkinson 3 525 1 175 1 175 20
Table 2: Configuration space of the fully connected neural network
Hyperparameters Choices
Initial LR {.0005, .001, .005, .01, .05, .1}
Batch Size {8, 16, 32, 64}
LR Schedule {cosine, fix}
Activation/Layer 1 {relu, tanh}
Activation/Layer 2 {relu, tanh}
Layer 1 Size {16, 32, 64, 128, 256, 512}
Layer 2 Size {16, 32, 64, 128, 256, 512}
Dropout/Layer 1 {0.0, 0.3, 0.6}
Dropout/Layer 2 {0.0, 0.3, 0.6}

Based on the gathered data, we compare ours optimization package to other optimizers such as HyperOpt[1] and HpBandSter[2] in two scenario: (1) Full-Budget Evaluation Strategy (2) HyperBand Evaluation Strategy.

Tabular Benchmarks[3] is publicly available at here.

Full-Budget Evaluation Strategy

In this section we use the generated benchmarks to evaluate different HPO methods. To mimic the randomness that comes with evaluating a configuration, in each function evaluation Tabular Benchmarks randomly sample one of the four performance values. Tabular Benchmarks do not take the additional overhead of the optimizer into account since it is negligible compared to the training time of the neural network.

After each function evaluation we estimate the incumbent as the configuration with the lowest observed error and compute the regret between the incumbent and the globally best configuration in terms of test error. Each method that operates on the full budget of 100 epochs was allowed to perform 200 function evaluations (200 iterations). We performed 20 independent runs of each method and report the median and the 25th and 90th quantile.

We compare three optimizers in this section:

  1. HyperOpt’s TPE optimizer

  2. UltraOpt’s ETPE optimizer

  3. Random optimizer as baseline

HyperOpt's TPE optimizer’s experimental code uses this.

Figure 2: Performance over Iterations

tabular_benchmarks.png

We can see UltraOpt’s ETPE optimizer better than HyperOpt’s TPE[1] optimizer in 3/4 times, tied it in one case.

HyperBand Evaluation Strategy

After evaluate optimizers in full-budget and compare performance over iterations, we want to know how much improvement HyperBand Evaluation Strategy[6] can bring and compares optimizers’ performance over time.

To obtain a realistic estimate of the wall-clock time required for each optimizer, we accumulated the stored runtime of each configuration the optimizer evaluated.

For BOHB[2] and HyperBand[6] we set the minimum budget to 3 epochs, the maximum budget to 100, $\eta$ to 3 and the number of successive halving iterations to 250.

You can view iterations table by entering following code in IPython:

In [1]: from ultraopt.multi_fidelity import HyperBandIterGenerator
In [2]: HyperBandIterGenerator(min_budget=3, max_budget=100, eta=3)
Out[2]: 
iter 0 iter 1 iter 2 iter 3
stage 0 stage 1 stage 2 stage 3 stage 0 stage 1 stage 2 stage 0 stage 1 stage 0
num_config 27 9 3 1 9 3 1 6 2 4
budget 3.70 11.11 33.33 100.00 11.11 33.33 100.00 33.33 100.00 100.00

In addition to the three optimizers described above, three optimizers are added in this section:

  1. HpBandster’s BOHB optimizer[2]

  2. UltraOpt’s BOHB optimier

  3. HyperBand as baseline

HpBandster's BOHB optimizer’s experimental code uses this:

UltraOpt's BOHB optimizer is implemented in following code:

iter_generator = HyperBandIterGenerator(min_budget=3, max_budget=100, eta=3)
fmin_result = fmin(objective_function, cs, optimizer="ETPE",
                    multi_fidelity_iter_generator=iter_generator)

HyperBand optimizer is implemented in following code:

iter_generator = HyperBandIterGenerator(min_budget=3, max_budget=100, eta=3)
fmin_result = fmin(objective_function, cs, optimizer="Random",
                    multi_fidelity_iter_generator=iter_generator)
Figure 3: Performance over Time (Protein Structure)

protein_structure_HB.png

First, let’s draw some conclusions from Protein Structure dataset’s benchmarks:

  • HyperBand achieved a reasonable performance relatively quickly but only slightly improves over simple Random Search eventually.

  • BOHB is in the beginning as good as HyperBand but starts outperforming it as soon as it obtains a meaningful model.

  • UltraOpt’s BOHB is better than HpBandSter’s BOHB .

Figure 4: Performance over Time (Slice Localization)

slice_localization_HB.png

Figure 5: Performance over Time (Naval Propulsion)

naval_propulsion_HB.png

Figure 6: Performance over Time (Parkinsons Telemonitoring)

parkinsons_telemonitoring_HB.png


Reference

[1] James Bergstra, Rémi Bardenet, Yoshua Bengio, and Balázs Kégl. 2011. Algorithms for hyper-parameter optimization. In Proceedings of the 24th International Conference on Neural Information Processing Systems (NIPS’11). Curran Associates Inc., Red Hook, NY, USA, 2546–2554.

[2] Falkner, Stefan et al. “BOHB: Robust and Efficient Hyperparameter Optimization at Scale.” ICML (2018).

[3] Klein, A. and F. Hutter. “Tabular Benchmarks for Joint Architecture and Hyperparameter Optimization.” ArXiv abs/1905.04970 (2019): n. pag.

[4] Lichman, M. (2013). UCI machine learning repository

[5] https://github.com/automl/HPOlib1.5/tree/development

[6] Li, L. et al. “Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization.” J. Mach. Learn. Res. 18 (2017): 185:1-185:52.