BOHB Robust and Efficient Hyperparameter Optimization at Scale
2 minute read ∼ Filed in : A paper noteIntroduction
Motivation
Modern deep learning methods are very sensitive to many hyperparameters, but the current search method has some limitations
- Vanilla Bayesian hyperparameter optimization is computationally infeasible.
- BO normally uses GP as a probabilistic model, but GP cannot scale well in high dimensions and exhibit cubic complexity in the number of data points. (bad scalability);)
- GP requires a special kernel to apply to complex configuration spaces. (bad flexibility))
- Bandit-based evaluation based on random search (Hyperbandit) lacks guidance and cannot converge to the best configurations quickly.
- it only samples **configurations randomly **at each ieration and does not learn from previously sampled configurations.
- It can lead to worse final performance than the model-based approach.
Contribution
The paper proposes the BOHB algorithm by combining BO and the bandit-based approach. BOHB can achieve strong anytime performance and fast convergence to optimal configurations.
It consistently outperforms both Bayesian optimization and Hyperband on a wide range of problem types. (SVM, NN, Bayesian NN, Deep RL, CNN)
Design target
- Strong Anytime Performance: HPO methods must yield good configurations with such a small budget.
- Strong Final performance
- Effective use of parallel resources
- Scalability The algorithm must handle problems ranging from just a few to many dozens of hyperparameters.
- Robustness & Flexibility The algorithm can handle different types of hyperparameters, binary, integer, continuous and categorical,
BOHB
BOHB relies on HB to determine how many configurations to evaluate with which budget, but it replaces the random selection of configurations at the beginning of each HB iteration by a model-based search.
Parallelization
- The resulting model is shared across all SH runs
- Each worker will either sample a new configuration or run the next SH run in parallel.
- starting different iterations at the same time
Evaluation
Counting ones
The paper defines a problem with the following :
This can Investigate BOHB’s behavior in high-dimensional mixed continuous/categorical configuration spaces (Ncat = 8 and Ncont = 8 parameters.). It uses SMAC since the random forest are known to perform well in high-dimensional categorical spaces. Test cfgs:
- Budget: number of samples,
- For each method, we performed 512 independent runs and report the immediate regret.
SVM
And then the paper measured SVM’s error on the different search algorithms. And the search target is the hyperparameters in RBF kernel.(the regularization parameter C and the kernel parameter ).
The budget is a number of training data points.
RL and BNN
Finally, the paper measures the BOHB on Bayesian Neural Networks, Reinforcement learning ( eight hyperparameters of proximal policy optimization) to learn cart-pole swing-up task, and CNN task on cifar10.
CNN
As for CNN for cifar10, the paper run BOHB with the following cfgs:
- Search target: Learning rate, momentum, weight decay, and batch size.
- budget: epoch, 22,66,200, 600
- 19 parallel workers, each with 2 GPUs for parallel training
The complete BOHB run of 16 iterations required a total of 33 GPU days, and achieve 2.78% test error.