BOHB Robust and Efficient Hyperparameter Optimization at Scale

Posted on May 17, 2022   2 minute read ∼ Filed in  : 

Introduction

Motivation

Modern deep learning methods are very sensitive to many hyperparameters, but the current search method has some limitations

  1. Vanilla Bayesian hyperparameter optimization is computationally infeasible.
    • BO normally uses GP as a probabilistic model, but GP cannot scale well in high dimensions and exhibit cubic complexity in the number of data points. (bad scalability);)
    • GP requires a special kernel to apply to complex configuration spaces. (bad flexibility))
  2. Bandit-based evaluation based on random search (Hyperbandit) lacks guidance and cannot converge to the best configurations quickly.
    • it only samples **configurations randomly **at each ieration and does not learn from previously sampled configurations.
    • It can lead to worse final performance than the model-based approach.

Contribution

The paper proposes the BOHB algorithm by combining BO and the bandit-based approach. BOHB can achieve strong anytime performance and fast convergence to optimal configurations.

It consistently outperforms both Bayesian optimization and Hyperband on a wide range of problem types. (SVM, NN, Bayesian NN, Deep RL, CNN)

image-20220518142102146

Design target

  1. Strong Anytime Performance: HPO methods must yield good configurations with such a small budget.
  2. Strong Final performance
  3. Effective use of parallel resources
  4. Scalability The algorithm must handle problems ranging from just a few to many dozens of hyperparameters.
  5. Robustness & Flexibility The algorithm can handle different types of hyperparameters, binary, integer, continuous and categorical,

BOHB

BOHB relies on HB to determine how many configurations to evaluate with which budget, but it replaces the random selection of configurations at the beginning of each HB iteration by a model-based search.

image-20220518153213202

Parallelization

image-20220519163907311

  1. The resulting model is shared across all SH runs
  2. Each worker will either sample a new configuration or run the next SH run in parallel.
  3. starting different iterations at the same time

image-20220518153256585

Evaluation

Counting ones

The paper defines a problem with the following :

image-20220518160704652

This can Investigate BOHB’s behavior in high-dimensional mixed continuous/categorical configuration spaces (Ncat = 8 and Ncont = 8 parameters.). It uses SMAC since the random forest are known to perform well in high-dimensional categorical spaces. Test cfgs:

  1. Budget: number of samples,
  2. For each method, we performed 512 independent runs and report the immediate regret.

image-20220518153851984

SVM

And then the paper measured SVM’s error on the different search algorithms. And the search target is the hyperparameters in RBF kernel.(the regularization parameter C and the kernel parameter ).

The budget is a number of training data points.

image-20220518155209472

RL and BNN

Finally, the paper measures the BOHB on Bayesian Neural Networks, Reinforcement learning ( eight hyperparameters of proximal policy optimization) to learn cart-pole swing-up task, and CNN task on cifar10.

CNN

As for CNN for cifar10, the paper run BOHB with the following cfgs:

  1. Search target: Learning rate, momentum, weight decay, and batch size.
  2. budget: epoch, 22,66,200, 600
  3. 19 parallel workers, each with 2 GPUs for parallel training

The complete BOHB run of 16 iterations required a total of 33 GPU days, and achieve 2.78% test error.





END OF POST




Tags Cloud


Categories Cloud




It's the niceties that make the difference fate gives us the hand, and we play the cards.