AttentiveNAS Improving Neural Architecture Search via Attentive Sampling
1 minute read ∼ Filed in : A paper noteIntroduction
Motivation
Context, related work, and gap:
Existing NAS follows two stages:
-
Train a big model only once. (or multiple single models with weight sharing)
The training can further divided into 2 stages
- sampling a batch of architectures,
- train them with weight sharing only one SGD. And then re-sampling.
-
Sample sub-model (or single model if using weight sharing. ) from the pre-trained big model according to resource constraints such as FLOPs, memory footprint, and runtime latency budgets on devices.
In the first stage, the existing paper mainly uniformly samples a batch of architecture to train during each step.
However, the uniform samples cannot bring any properties at the searching stage into the training stage and thus miss the opportunity of further boosting the accuracy of the architectures.
Contribution
The paper mainly targets on sampling phase of the first stage.
It proposes a resource-aware sampling algorithm that can pay more attention to the architecture that is more likely to produce a better Pareto front. And the algorithm has two properties.
- Can decides which architectures to sample next.
- Sample efficiently (less computation overhead)
In a word, it brings the model size into the sampling phase; it tries to sample the best or worse architecture under each size ( best or worse is decided using a pre-trained model-performance-prediction model ) and then trains those sampled models. The sampling and training will be repeated multiple times until converage.
Method
To make the training stage aware of the requirements of the search stage, the paper reformulates training into following:
Experiment
Performance predictor
The paper use tree as predictor
BestUp or WorstUp
End2End result
The algorithm can get good model under different MFLOPs