Improving Keyword Spotting and Language Identification via Neural Architecture Search at Scale

Posted on May 11, 2022   1 minute read ∼ Filed in  : 

Introduction

Current problems

  1. Hard to design a new network, and its computation is expensive.
  2. search spaces design has a bias toward human expertise. So the final model is sub-optimal.

Contribution

The main contribution of this paper is to propose a novel approach to search for DNN architectures aimed at resolving the two search problems mentioned above by

  1. defining an incremental search; (tower, blocks. )

  2. using a transferable training; (weight sharing)

  3. using a set of generic neural network blocks (recurrent or a convolutional layer)

  4. Using ensembling to reduce the parameters of a single architecture.

    Ensembling is a well-studied field it is domain agnostic and it is a natural way to increase the size of the network given a good performing model architecture.

The searched architecture has fewer parameters and higher performance.

Algorithm

The algorithm first defines the architecture as a combination of k blocks. Each block has n options.

And then search for the best architecture over all possible candidates.

  1. The search run in distributed asynchronous fashion. (each trainer will run search algorithm 1 ) after searching, it will record it to A.
  2. The search will do a mutation, which will explore the depth and then do the exploitation.
  3. After finding the best architecture, it uses algorithm3 to produce an average weighted ensemble of a number of repetitions of that candidate, retraining them from scratch with the different shuffling of the data and different initialization parameters.

image-20220511225217874

Result

Setting

Search phase use 10 million steps

Ensembling is invoked using 50 million steps.

15 trainers, each with 350 workers. Searching for one week.

image-20220511231057286

Result

  1. When ensembling this configuration twice (p = 2), we improve the accuracy from 59% to 62.77%.

  2. Compare the accuracy of searched architecture and human-designed architecture.

    The searched architectures are smaller and converge faster than the existing ones

  3. Show the search architecture’s blocks.

  4. performace / steps.

image-20220511230852626

image-20220511231130684

image-20220511231210201

###





END OF POST




Tags Cloud


Categories Cloud




It's the niceties that make the difference fate gives us the hand, and we play the cards.