AttentiveNAS Improving Neural Architecture Search via Attentive Sampling

Posted on August 1, 2022 1 minute read ∼ Filed in : A paper note

Introduction

Motivation

Context, related work, and gap:

Existing NAS follows two stages:

Train a big model only once. (or multiple single models with weight sharing)

The training can further divided into 2 stages
1. sampling a batch of architectures,
2. train them with weight sharing only one SGD. And then re-sampling.
Sample sub-model (or single model if using weight sharing. ) from the pre-trained big model according to resource constraints such as FLOPs, memory footprint, and runtime latency budgets on devices.

In the first stage, the existing paper mainly uniformly samples a batch of architecture to train during each step.

However, the uniform samples cannot bring any properties at the searching stage into the training stage and thus miss the opportunity of further boosting the accuracy of the architectures.

Contribution

The paper mainly targets on sampling phase of the first stage.

It proposes a resource-aware sampling algorithm that can pay more attention to the architecture that is more likely to produce a better Pareto front. And the algorithm has two properties.

Can decides which architectures to sample next.
Sample efficiently (less computation overhead)

In a word, it brings the model size into the sampling phase; it tries to sample the best or worse architecture under each size ( best or worse is decided using a pre-trained model-performance-prediction model ) and then trains those sampled models. The sampling and training will be repeated multiple times until converage.