ZERO-COST PROXIES FOR LIGHTWEIGHT NAS
1 minute read ∼ Filed in : A paper noteNetwork pruning is a technique about using a single forward/backward to compute a saliency criterion which is to remove the less-important parameters and prune neural networks before training.
Recently, some work like EcoNAS uses reduced-computation training to evaluate the architecture with reduced-cfgs such as input size, model size, and training samples, such that the evaluation is fast.
The paper tries to bring pruning techniques into model selection by adapting original pruning matrics to score entire neural network models instead of only single parameters. Eg, it sums the matrics over all parameters.
This paper first compares 6 training-free NAS algorithms with conventional proxies and shows the algorithms can outperform EcoNAS in maintaining rank consistency. 6 training-free NAS algorithms include:
- grad_norm: sum the Euclidean norm of the gradients after a single minibatch of training data.
- Jacob_cov: from NAS without training paper.
- Vote: it takes a majority vote between the three metrics Synflow, Jacob_cov, and Snip
And then the paper compares 6 training-free NAS algorithms on 4 datasets of varying sizes and tasks in the PytTorchCV search space. It found only Synflow can perform consistently well.
Finally, the paper uses the warmup and zero-cost move proposal to better use the Synflow algorithm.
- warmup: use the training-free matrics to initialize the controller in RL or GCN in predictor-based search algorithm.