Practical Bayesian Optimization of Machine Learning Algorithms

Posted on May 14, 2022 1 minute read ∼ Filed in : A paper note

Introduction

Bo

Bayesian optimization efficiently trades off exploration and exploitation of the parameter space to quickly guide the user into the configuration that best optimizes some overall evaluation criteria (OEC) like accuracy, AUC, or likelihood.

Bayesian optimization assumes the unknown function was sampled from a GP and maintains a posterior distribution for this function as a result of running learning algorithms.

Acquisition Functions can be EI, UCB, PI.

Examples:

ML differs from other BlackBox optimization.

In ML, different parameters (number of hidden units) may result in different evaluation times. Evaluation time needs to be considered.
ML runs in parallel on multiple cores. Parallelly computing should be used in BO.

Contribution

Make clear the relationship between the covariance function and the hyperparameters.
Take evaluation time into consideration
The paper leverages multiple cores for parallel experiments in BO process.
The result shows the alrogirhtm in the paper improves on previous automatic procedures and can reach or surpass human expert-level optimization for many algorithms including latent Dirichlet allocation, structured SVMs, and convolutional neural networks.

Practical Consideration for BO

Covariance Functions

The paper used ARD 5/2 kernel

Modeling costs

The paper proposes optimizing with the expected improvement per second, which prefers to acquire points that are not only likely to be good, but that is also likely to be evaluated quickly.

The idea it to model the duration function c(x) along with object function f(x). And it assume c(x) and f(x) are indenpendent.

Parallelism

Try to decide what x should be evaluated next, even while a set of points are being evaluated

The paper proposes a sequential strategy that takes advantage of the tractable inference properties of the Gaussian process to compute Monte Carlo estimates of the acquisition function under different possible results from pending function evaluations

They found our Monte Carlo estimation procedure to be highly effective in practice.

Practical Bayesian Optimization of Machine Learning Algorithms

Introduction

Bo

ML differs from other BlackBox optimization.

Contribution

Practical Consideration for BO

Covariance Functions

Modeling costs

Parallelism

Tags Cloud

Categories Cloud

It's the niceties that make the difference fate gives us the hand, and we play the cards.

Introduction

Bo

ML differs from other BlackBox optimization.

Contribution

Practical Consideration for BO

Covariance Functions

Modeling costs

Parallelism

END OF POST

Tags Cloud

Categories Cloud

It's the niceties that make the difference fate gives us the hand, and we play the cards.