BOHB Robust and Efficient Hyperparameter Optimization at Scale

Posted on May 17, 2022 2 minute read ∼ Filed in : A paper note

Introduction

Motivation

Modern deep learning methods are very sensitive to many hyperparameters, but the current search method has some limitations

Vanilla Bayesian hyperparameter optimization is computationally infeasible.
- BO normally uses GP as a probabilistic model, but GP cannot scale well in high dimensions and exhibit cubic complexity in the number of data points. (bad scalability);)
- GP requires a special kernel to apply to complex configuration spaces. (bad flexibility))
Bandit-based evaluation based on random search (Hyperbandit) lacks guidance and cannot converge to the best configurations quickly.
- it only samples **configurations randomly **at each ieration and does not learn from previously sampled configurations.
- It can lead to worse final performance than the model-based approach.

Contribution

The paper proposes the BOHB algorithm by combining BO and the bandit-based approach. BOHB can achieve strong anytime performance and fast convergence to optimal configurations.

It consistently outperforms both Bayesian optimization and Hyperband on a wide range of problem types. (SVM, NN, Bayesian NN, Deep RL, CNN)

Design target

Strong Anytime Performance: HPO methods must yield good configurations with such a small budget.
Strong Final performance
Effective use of parallel resources
Scalability The algorithm must handle problems ranging from just a few to many dozens of hyperparameters.
Robustness & Flexibility The algorithm can handle different types of hyperparameters, binary, integer, continuous and categorical,

BOHB

BOHB relies on HB to determine how many configurations to evaluate with which budget, but it replaces the random selection of configurations at the beginning of each HB iteration by a model-based search.

Parallelization

The resulting model is shared across all SH runs
Each worker will either sample a new configuration or run the next SH run in parallel.
starting different iterations at the same time

Evaluation

Counting ones

The paper defines a problem with the following :

This can Investigate BOHB’s behavior in high-dimensional mixed continuous/categorical configuration spaces (Ncat = 8 and Ncont = 8 parameters.). It uses SMAC since the random forest are known to perform well in high-dimensional categorical spaces. Test cfgs:

Budget: number of samples,
For each method, we performed 512 independent runs and report the immediate regret.

SVM

And then the paper measured SVM’s error on the different search algorithms. And the search target is the hyperparameters in RBF kernel.(the regularization parameter C and the kernel parameter ).

The budget is a number of training data points.

RL and BNN

Finally, the paper measures the BOHB on Bayesian Neural Networks, Reinforcement learning ( eight hyperparameters of proximal policy optimization) to learn cart-pole swing-up task, and CNN task on cifar10.

CNN

As for CNN for cifar10, the paper run BOHB with the following cfgs:

Search target: Learning rate, momentum, weight decay, and batch size.
budget: epoch, 22,66,200, 600
19 parallel workers, each with 2 GPUs for parallel training

The complete BOHB run of 16 iterations required a total of 33 GPU days, and achieve 2.78% test error.

BOHB Robust and Efficient Hyperparameter Optimization at Scale

Introduction

Motivation

Contribution

Design target

BOHB

Parallelization

Evaluation

Counting ones

SVM

RL and BNN

CNN

Tags Cloud

Categories Cloud

It's the niceties that make the difference fate gives us the hand, and we play the cards.

Introduction

Motivation

Contribution

Design target

BOHB

Parallelization

Evaluation

Counting ones

SVM

RL and BNN

CNN

END OF POST

Tags Cloud

Categories Cloud

It's the niceties that make the difference fate gives us the hand, and we play the cards.