Improving Keyword Spotting and Language Identification via Neural Architecture Search at Scale

Posted on May 11, 2022   1 minute read ∼ Filed in  : 


Current problems

  1. Hard to design a new network, and its computation is expensive.
  2. search spaces design has a bias toward human expertise. So the final model is sub-optimal.


The main contribution of this paper is to propose a novel approach to search for DNN architectures aimed at resolving the two search problems mentioned above by

  1. defining an incremental search; (tower, blocks. )

  2. using a transferable training; (weight sharing)

  3. using a set of generic neural network blocks (recurrent or a convolutional layer)

  4. Using ensembling to reduce the parameters of a single architecture.

    Ensembling is a well-studied field it is domain agnostic and it is a natural way to increase the size of the network given a good performing model architecture.

The searched architecture has fewer parameters and higher performance.


The algorithm first defines the architecture as a combination of k blocks. Each block has n options.

And then search for the best architecture over all possible candidates.

  1. The search run in distributed asynchronous fashion. (each trainer will run search algorithm 1 ) after searching, it will record it to A.
  2. The search will do a mutation, which will explore the depth and then do the exploitation.
  3. After finding the best architecture, it uses algorithm3 to produce an average weighted ensemble of a number of repetitions of that candidate, retraining them from scratch with the different shuffling of the data and different initialization parameters.




Search phase use 10 million steps

Ensembling is invoked using 50 million steps.

15 trainers, each with 350 workers. Searching for one week.



  1. When ensembling this configuration twice (p = 2), we improve the accuracy from 59% to 62.77%.

  2. Compare the accuracy of searched architecture and human-designed architecture.

    The searched architectures are smaller and converge faster than the existing ones

  3. Show the search architecture’s blocks.

  4. performace / steps.






Tags Cloud

Categories Cloud

It's the niceties that make the difference fate gives us the hand, and we play the cards.