Improving Keyword Spotting and Language Identification via Neural Architecture Search at Scale
1 minute read ∼ Filed in : A paper noteIntroduction
Current problems
- Hard to design a new network, and its computation is expensive.
- search spaces design has a bias toward human expertise. So the final model is sub-optimal.
Contribution
The main contribution of this paper is to propose a novel approach to search for DNN architectures aimed at resolving the two search problems mentioned above by
-
defining an incremental search; (tower, blocks. )
-
using a transferable training; (weight sharing)
-
using a set of generic neural network blocks (recurrent or a convolutional layer)
-
Using ensembling to reduce the parameters of a single architecture.
Ensembling is a well-studied field it is domain agnostic and it is a natural way to increase the size of the network given a good performing model architecture.
The searched architecture has fewer parameters and higher performance.
Algorithm
The algorithm first defines the architecture as a combination of k blocks. Each block has n options.
And then search for the best architecture over all possible candidates.
- The search run in distributed asynchronous fashion. (each trainer will run search algorithm 1 ) after searching, it will record it to A.
- The search will do a mutation, which will explore the depth and then do the exploitation.
- After finding the best architecture, it uses algorithm3 to produce an average weighted ensemble of a number of repetitions of that candidate, retraining them from scratch with the different shuffling of the data and different initialization parameters.
Result
Setting
Search phase use 10 million steps
Ensembling is invoked using 50 million steps.
15 trainers, each with 350 workers. Searching for one week.
Result
-
When ensembling this configuration twice (p = 2), we improve the accuracy from 59% to 62.77%.
-
Compare the accuracy of searched architecture and human-designed architecture.
The searched architectures are smaller and converge faster than the existing ones
-
Show the search architecture’s blocks.
-
performace / steps.
###