Practical Bayesian Optimization of Machine Learning Algorithms
1 minute read ∼ Filed in : A paper noteIntroduction
Bo
Bayesian optimization efficiently trades off exploration and exploitation of the parameter space to quickly guide the user into the configuration that best optimizes some overall evaluation criteria (OEC) like accuracy, AUC, or likelihood.
Bayesian optimization assumes the unknown function was sampled from a GP and maintains a posterior distribution for this function as a result of running learning algorithms.
Acquisition Functions can be EI, UCB, PI.
Examples:
ML differs from other BlackBox optimization.
- In ML, different parameters (number of hidden units) may result in different evaluation times. Evaluation time needs to be considered.
- ML runs in parallel on multiple cores. Parallelly computing should be used in BO.
Contribution
- Make clear the relationship between the covariance function and the hyperparameters.
- Take evaluation time into consideration
- The paper leverages multiple cores for parallel experiments in BO process.
- The result shows the alrogirhtm in the paper improves on previous automatic procedures and can reach or surpass human expert-level optimization for many algorithms including latent Dirichlet allocation, structured SVMs, and convolutional neural networks.
Practical Consideration for BO
Covariance Functions
The paper used ARD 5/2 kernel
Modeling costs
The paper proposes optimizing with the expected improvement per second, which prefers to acquire points that are not only likely to be good, but that is also likely to be evaluated quickly.
The idea it to model the duration function c(x) along with object function f(x). And it assume c(x) and f(x) are indenpendent.
Parallelism
Try to decide what x should be evaluated next, even while a set of points are being evaluated
The paper proposes a sequential strategy that takes advantage of the tractable inference properties of the Gaussian process to compute Monte Carlo estimates of the acquisition function under different possible results from pending function evaluations
They found our Monte Carlo estimation procedure to be highly effective in practice.