Learning Transferable Architectures for Scalable Image
2 minute read ∼ Filed in : A paper noteIntroduction
Problems
Previous work like RF for NAS has a huge search space and is hard to train on the large datasets since training each child architecture requires lots of time with large datasets.
Contributions
The main contribution of this work is the design of a novel search space, such that the best architecture found on the CIFAR-10 dataset would scale to larger, higher resolution image datasets like ImageNet across a range of computational settings.
Key Feature of the search Space
**The overall architectures of the convolutional nets are manually predetermined. **with
-
Two kinds of cells
Each cell has 5 blocks, and each block has a fixed architecture with
- Two inputs
- Two operators
- One combination method.
But there are many options for each of the above.
- 2 options for each input.
- 9 options for each operator.
- 2 options for each combination method
-
Each cell receives as input two initial hidden states hi and hi-1 which are the outputs of two cells in the previous two lower layers or the input image.
Search Target:
- All options of each block
Search Method
OverAll
Controller Sampling
The controller uses one-layer LSTM with 100 hidden units. to predict a child’s architecture. And the prediction process is as follows:
The selection result of one hidden state is the input when predicting the second hidden state, And the result of the second hidden state is the input when predicting the first operation of the first hidden state.
Controller Training
After each prediction, the controller will record their probability, so there is a total of 10B prediction results. The controller will use joint probability ( product of all probabilities. ) to compute the gradients by hiring reinforcement learning like PPO
5(# Search target) * B(# block) * 2 ( 2 kinds of cell) = 10B
Experiment Result
Performance
Efficiency