Neural Architecture Search A Survey

Posted on January 23, 2022   4 minute read ∼ Filed in  : 

[JMLR-2019] Neural Architecture Search: A Survey

Problems:

  1. review some basic knowledge about Reinforcement learning, and Bayesian optimization.

Introduction

The paper categorizes methods for NAS according to three dimensions: Search space, Search strategy and Performance estimation strategy.

image-20220129213313581

search space

Incorporating prior knowledge about architecture and tasks can reduce the search space, but introduce human bias.

Search strategy:

  • Goal1: Find well-performing architectures as quickly as possible.
  • Goal2: Avoid convergence to a region of suboptimal architectures.

Performance Estimation Strategy

  • Goal1: Find architectures that achieve high predictive performance on the validation dataset.
  • Goal2: Reduce as much evaluation cost as possible

Search Space

Chain-structured Neural Networks

layer l0’s output is l1’s input.

search space is parameterized by

  1. number of layers n
  2. type or operations very layer execution, eg., pooling, convolution, etc
  3. Hyperparameters of each layer
  4. number of fully-connected networks.

multi-branch networks

input of layer i can be formally described as a function G() combining previous layer outputs

image-20220129220210912

Block/cell-based networks

Two kinds of cells: Normal cells that preserve dimensionality of input and reduction cell reduce spatial dimension.

Final architecture is built by stacking cells in predified manner.

image-20220129225010980

Advantages:

  1. Search space is reduced because cell has fewer layers.
  2. Architecture built from cells can more easily be transferred or adapted to other dataset.
  3. Creating architecture by repeating blocks is a more useful design.

Block/cell-based New design

How to choose the macro-architecture: how many cells shall be used and how should they be connected to build the actual model? Hard-coded macro architecture:

  • Each cell receives the outputs of the two preceding cells as input.
  • Manually designed architectures, eg., DenseNet

In general the cell based searching includes 3 steps:

  1. Define a set of primitive operations
  2. connect primitive operations and form the cell
  3. Hard-coded macro-architecture.

Search Strategy

Search strategies: random search, Bayesian optimization, evolutionary methods, reinforcement learning, and gradient-based methods.

Bayesian Optimization

Some papers derive kernel functions for architecture search spaces in order to use classic GP-based BO methods

Optimize both neural architectures and their hyperparameters jointly.

##

reinforcement learning method

Agent’s action: Generation of a neural architecture with action space: identical to the search space.

Agent’s reward: Estimate of the performance of the trained architecture on unseen data.

Different RL approaches differ in how they represent the agent’s policy and how they optimize it.

  1. Recurrent neural network (RNN) policy to sequentially sample a string that in turn encodes the neural architecture.
  2. Q-learning to train a policy which sequentially chooses a layer’s type and corresponding hyperparameters.

Sequential decision processes

state is the current (partially trained) architecture

reward is an estimate of the architecture’s performance

action corresponds to an application of function-preserving mutations followed by a training phase of the network.

Evolutionary method

Using gradient-based methods for optimizing weights and solely use evolutionary algorithms for optimizing the neural architecture itself.

in every evolution step, at least one model from the population is sampled and serves as a parent to generate offsprings by applying mutations to it. In the context of NAS, mutations are local operations, such as adding or removing a layer, altering the hyperparameters of a layer, adding skip connections, as well as altering training hyperparameters. After training the offsprings, their fitness (e.g., performance on a validation set) is evaluated and they are added to the population.

Challenge

  1. how to sample parents
  2. update populations
  3. generate offsprings

Compare

RL and evolution perform equally well in terms of final test accuracy, with evolution having better anytime performance and finding smaller models.

Random search test error = 3.9% on CIFAR-10 and a top-1 validation error of 21.0% on ImageNet.

Evolution-based method: 3.75% and 20.3% respectively.

Performance Estimation Strategy

To guide the search process, these search strategies need to estimate the performance of a given architecture A they consider.

image-20220130200229344

Lower fidelity estimates

Learning Curve extrapolation

Consider architectural hyperparameters for predicting which partial learning curves are most promising.

Challenge

The main challenge for predicting the performances of neural architectures is good predictions in a relatively large search space need to be made based on relatively few evaluations.

One-Shot Models

image-20220130202859073

Treats all architectures as different sub-graphs of a supergraph (the one-shot model) and shares weights between architectures that have edges of this supergraph in common.

Only the weights of a single one-shot model need to be trained (in one of various ways), and architectures (which are just subgraphs of the one-shot model) can then be evaluated without any separate training by inheriting trained weights from the one-shot model.

Challenge

How the one-shot model is trained.

Future Directions





END OF POST




Tags Cloud


Categories Cloud




It's the niceties that make the difference fate gives us the hand, and we play the cards.