NAS-BENCH-SUITE NAS EVALUATION IS (NOW) SURPRISINGLY EASY

Posted on May 3, 2022 3 minute read ∼ Filed in : A paper note

Introduction

Problem

A search space in NAS is the set of all architectures that the NAS algorithm is allowed to select.

Some of them are defined by cell-based structure and a macrostructure. The macrostructure is completely fixed, and the structure inside each cell has many neural network operations to select.
Other macro structures can have variable lengths, widths, and a number of channels.

Each Benchmark consists of the dataset, search space, and fixed evaluation pipeline with predefined hyperparameters for training the architecture.

The tabular NAS benchmark also provides pre-computed evaluations for all architectures.
Surrogate NAS benchmark provides a fixed surrogate model to predict the performance of each architecture.

Most existing work only draws conclusions on a small subset of benchmarks. But They may not be generalized across diverse datasets and tasks.

Many early NAS search algorithm has a strong couple with search space ( eg. weight sharing. ).

Contribution

Conduct some analysis of the generalizability of NAS algorithms and their hyperparameters across 25 settings, finding that the controller’s hypermeters tuned on a subset of benchmarks cannot be used in other benchmarks.
Propose a system composing many queryable NAS benchmarks ( 25 different combinations of search spaces and datasets.). So others can evaluate the new NAS algorithm quickly.

The system basically re-implementing each new NAS algorithm on serval search spaces.

Conclusion of the paper

there is no single best NAS method: which method performs best very much depends on the benchmark (search space)
Along similar lines, if a NAS method performs well on the popular NAS benchmarks NAS-Bench-101 and all three datasets of NAS-Bench-201, in contrast to what one might have expected, this does not imply that it will also perform well on other NAS benchmarks.
tuning a NAS algorithm’s hyperparameters can make it dramatically better, but transferring such hyperparameters across benchmarks often fails.

Nas BenchMark Statistics

Validation accuracy

The paper firstly tests the minimum and maximum accuracy of each setting ( benchmark + dataset. ).

It shows diversity is important to keep in context when comparing many different NAS benchmarks

Then The paper run 3 types of search algorithms on above different settings to prove following:

If NAS does well on NAS benchmarks NAS-Bench-101 and all three datasets of NAS-Bench-201, It probably not generalize to other NAS benchmarks.
NAS algorithms needs tunning when using at different benchmarks.

Search Algorithms

Black-box NAS
- Random search
- Regularized evolution
- local search
- BANANAS
- NPENAS
Model-based performance prediction methods.
- BOHAMIANN
- gaussian process
- Random forest
- Neural architecture optimization
- XGBoost

Generalizability of NAS Algorithms

The paper firstly conducts some experiments to show the algorithm suitable for one search space (benchmark) may not still that good at another search space.

Experiments

Run algorithms w/o hyperparameter optimization.

Then the paper gives summary by computing the average rank of each black-box algorithm or performance prediction method across all 25 NAS benchmarks.

Founds

No algorithm performs well across all search spaces
Best predictor with default parameters is RF, and the best predictor when tuned on each individual benchmark is XGBoost

Generalizability of HYPERPARAMETERS

While the previous section assessed the generalizability of NAS methods, now we assess the generalizability of the hyperparameters within NAS methods.

Experiments

For a given NAS method, we can tune it on NAS benchmark A, and then evaluate the performance of the tuned method on NAS benchmark B, compared to the performance of the best hyperparameters from NAS benchmark B.

Founds

no search space achieves strong regret (smaller is better) across most search spaces.
it is not sufficient to tune hyperparameters on one NAS benchmark and deploy on other benchmarks, as this can often make the performance worse.

One-shot algorithm

One-shot NAS algorithms, in which a single supernetwork representing the entire search space is trained

The paper compare the performance of three one-shot algorithms:

DARTS
GDAS
DrNAS

across several different NAS benchmarks

NAS-BENCH-SUITE NAS EVALUATION IS (NOW) SURPRISINGLY EASY

Introduction

Problem

Contribution

Conclusion of the paper

Nas BenchMark Statistics

Validation accuracy

Search Algorithms

Generalizability of NAS Algorithms

Experiments

Founds

Generalizability of HYPERPARAMETERS

Experiments

Founds

One-shot algorithm

Some graph

Tags Cloud

Categories Cloud

It's the niceties that make the difference fate gives us the hand, and we play the cards.

Introduction

Problem

Contribution

Conclusion of the paper

Nas BenchMark Statistics

Validation accuracy

Search Algorithms

Generalizability of NAS Algorithms

Experiments

Founds

Generalizability of HYPERPARAMETERS

Experiments

Founds

One-shot algorithm

Some graph

END OF POST

Tags Cloud

Categories Cloud

It's the niceties that make the difference fate gives us the hand, and we play the cards.