Efficient Neural Architecture Search via Parameter Sharing

Posted on April 30, 2022 1 minute read ∼ Filed in : A paper note

Introduction

Current Problems

BoottleNet of NAS is the training of each child model to convergence

Contribution

ENAS constructs a large computational graph, where each subgraph represents a neural network architecture.

The main contribution of this work is to improve the efficiency of NAS by forcing all child models to share weights to skip training each child model from scratch to convergence.

Method

Represent NAS’s search space using a single directed acyclic graph (DAG), where each child architecture can be viewed as a sub-graph of the larger graph. Since the weight is in a large graph, Therefore, all sub-graph shared the same parameters.

Each node is an operation.

Controller

Take the RNN node, for example, Each Node is just an operation.

Controller decides

Which edges are activated?
Which computations are performed at each node?

RNN prediction step;

The controller is an LSTM with 100 hidden units. And the sample decision via softmax classifiers,

Training Process

Training of sampled architecture

Sampled architecture can be:

RNN for Penn Treebank: 400 steps, 64 examples.
CNN for CIFAR-10: minibatch 128

Overall

It fixes the controller, runs controller prediction M times, and gets M child models.
The FInal loss function is the expected loss function, overall child models. The gradient is computed using a Monte Carlo estimate.
The experiment shows M = 1 is just fine.**