Learning Transferable Architectures for Scalable Image

Posted on April 28, 2022   2 minute read ∼ Filed in  : 

Introduction

Problems

Previous work like RF for NAS has a huge search space and is hard to train on the large datasets since training each child architecture requires lots of time with large datasets.

Contributions

The main contribution of this work is the design of a novel search space, such that the best architecture found on the CIFAR-10 dataset would scale to larger, higher resolution image datasets like ImageNet across a range of computational settings.

Key Feature of the search Space

**The overall architectures of the convolutional nets are manually predetermined. **with

  • Two kinds of cells

    Each cell has 5 blocks, and each block has a fixed architecture with

    • Two inputs
    • Two operators
    • One combination method.

    But there are many options for each of the above.

    • 2 options for each input.
    • 9 options for each operator.
    • 2 options for each combination method
  • Each cell receives as input two initial hidden states hi and hi-1 which are the outputs of two cells in the previous two lower layers or the input image.

Search Target:

  • All options of each block

image-20220429224215412

Search Method

OverAll

image-20220428200042675

Controller Sampling

The controller uses one-layer LSTM with 100 hidden units. to predict a child’s architecture. And the prediction process is as follows:

image-20220428200155383

image-20220428200410297

The selection result of one hidden state is the input when predicting the second hidden state, And the result of the second hidden state is the input when predicting the first operation of the first hidden state.

Controller Training

After each prediction, the controller will record their probability, so there is a total of 10B prediction results. The controller will use joint probability ( product of all probabilities. ) to compute the gradients by hiring reinforcement learning like PPO

5(# Search target) * B(# block) * 2 ( 2 kinds of cell) = 10B

Experiment Result

Performance

image-20220429224342305

image-20220429224440382

image-20220429224528222

image-20220429224547591

Efficiency

image-20220429224619020





END OF POST




Tags Cloud


Categories Cloud




It's the niceties that make the difference fate gives us the hand, and we play the cards.