Practical Bayesian Optimization of Machine Learning Algorithms

Posted on May 14, 2022   1 minute read ∼ Filed in  : 

Introduction

Bo

Bayesian optimization efficiently trades off exploration and exploitation of the parameter space to quickly guide the user into the configuration that best optimizes some overall evaluation criteria (OEC) like accuracy, AUC, or likelihood.

Bayesian optimization assumes the unknown function was sampled from a GP and maintains a posterior distribution for this function as a result of running learning algorithms.

Acquisition Functions can be EI, UCB, PI.

image-20220516154927618

Examples:

image-20220516162603934

ML differs from other BlackBox optimization.

  1. In ML, different parameters (number of hidden units) may result in different evaluation times. Evaluation time needs to be considered.
  2. ML runs in parallel on multiple cores. Parallelly computing should be used in BO.

Contribution

  1. Make clear the relationship between the covariance function and the hyperparameters.
  2. Take evaluation time into consideration
  3. The paper leverages multiple cores for parallel experiments in BO process.
  4. The result shows the alrogirhtm in the paper improves on previous automatic procedures and can reach or surpass human expert-level optimization for many algorithms including latent Dirichlet allocation, structured SVMs, and convolutional neural networks.

Practical Consideration for BO

Covariance Functions

The paper used ARD 5/2 kernel

image-20220516160439083

Modeling costs

The paper proposes optimizing with the expected improvement per second, which prefers to acquire points that are not only likely to be good, but that is also likely to be evaluated quickly.

The idea it to model the duration function c(x) along with object function f(x). And it assume c(x) and f(x) are indenpendent.

Parallelism

Try to decide what x should be evaluated next, even while a set of points are being evaluated

The paper proposes a sequential strategy that takes advantage of the tractable inference properties of the Gaussian process to compute Monte Carlo estimates of the acquisition function under different possible results from pending function evaluations

image-20220516161957191

They found our Monte Carlo estimation procedure to be highly effective in practice.





END OF POST




Tags Cloud


Categories Cloud




It's the niceties that make the difference fate gives us the hand, and we play the cards.