NAS-Bench-101 Towards Reproducible Neural Architecture Search
1 minute read ∼ Filed in : A paper noteIntroduction
The paper makes the following contributions:
-
The paper introduces the NAS-Bench-101, which includes 423k unique architectures and their corresponding accuracy and runtime (Trained on cifar 10)
-
Conducts experiments on various optimization algorithms on NAS.
NASBench Dataset
Architecture
Cell Encoding
Each cell is encoded by an adjacency matrix and a label vector with length 3, each element in the label vector is one operation. So there are a total of 510M unique cells => 510M unique models.
Each cell in a model is the same.
After removing the invalid module (no path from input or edges exceeds 9 ), there are 423K models left.
Combine semantics
When multiple edges point to the same vertex, the incoming tensors must be combined. Adding them or concatenating them are both standard techniques (NASI uses the weighted combination. )
The paper uses the rule: tensors going to the output vertex are concatenated and those going into other vertices are summed.
Training
Datasets: CIFAR10, 40k training examples, 10k validation examples, 10k testing examples.
Hyperparameters:
- utilize a single, fixed set of hyperparameters for all NAS-Bench-101 models. (50 in this paper. )
- Find hyperparameters optimizing the average accuracy of those models.
Optimization: RMSProp
Loss: cross-entry loss with L2 decay.
Matrix
The paper evaluates each architecture after training three times with different random initializations.
- Training accuracy;
- Validation accuracy;
- Testing accuracy;
- Training time in seconds;
- A number of trainable model parameters.
NASBench as a Benchmark
NAS algorithm compare
The paper firstly compares various NAS algorithms and HPO algorithms including
- RS, SMAC, BOHB.
- regularized evaluation
- RL based
Each algorithm runs 500 independent trails, (evaluate + update algorithm), and measure the regret vs time spent on evaluating those 500 trails.
Foundings:
- RE, BOHB, and SMAC perform best and start to outperform RS.
- RE, BOHB, and SMAC show the most robust performance.