AttentiveNAS Improving Neural Architecture Search via Attentive Sampling

Posted on August 1, 2022   1 minute read ∼ Filed in  : 

Introduction

Motivation

Context, related work, and gap:

Existing NAS follows two stages:

  1. Train a big model only once. (or multiple single models with weight sharing)

    The training can further divided into 2 stages

    1. sampling a batch of architectures,
    2. train them with weight sharing only one SGD. And then re-sampling.

    image-20220801231525683

  2. Sample sub-model (or single model if using weight sharing. ) from the pre-trained big model according to resource constraints such as FLOPs, memory footprint, and runtime latency budgets on devices.

    image-20220801231541338

In the first stage, the existing paper mainly uniformly samples a batch of architecture to train during each step.

However, the uniform samples cannot bring any properties at the searching stage into the training stage and thus miss the opportunity of further boosting the accuracy of the architectures.

Contribution

The paper mainly targets on sampling phase of the first stage.

It proposes a resource-aware sampling algorithm that can pay more attention to the architecture that is more likely to produce a better Pareto front. And the algorithm has two properties.

  1. Can decides which architectures to sample next.
  2. Sample efficiently (less computation overhead)

In a word, it brings the model size into the sampling phase; it tries to sample the best or worse architecture under each size ( best or worse is decided using a pre-trained model-performance-prediction model ) and then trains those sampled models. The sampling and training will be repeated multiple times until converage.

Method

To make the training stage aware of the requirements of the search stage, the paper reformulates training into following: image-20220801232105239

image-20220801232728176

image-20220801232452634

Experiment

Performance predictor

The paper use tree as predictor

image-20220802000201350

BestUp or WorstUp

image-20220802000255679

image-20220802000400768

End2End result

The algorithm can get good model under different MFLOPs

image-20220802000423867





END OF POST




Tags Cloud


Categories Cloud




It's the niceties that make the difference fate gives us the hand, and we play the cards.