DARTS DIFFERENTIABLE ARCHITECTURE SEARCH

Posted on April 27, 2022   2 minute read ∼ Filed in  : 

DARTS: DIFFERENTIABLE ARCHITECTURE SEARCH

Introduction

Current problems

The architecture search algorithm is computationally demanding.

  • RL for NAS needs 2000 GPU days.
  • Even with optimizations like weight prediction, performance prediction, weight sharing,
  • The main reason for this is all searching methods (RL, Bo, etc) treated NAS as a block box optimization over a discrete domain

Contributions

It treats NAS from different angles. Instead of searching over a discrete set of candidate architectures, we relax the search space to be continuous, so that the architecture can be optimized with respect to its validation set performance by gradient descent.

  1. introduce a new algorithm for differentiable NAS based on bilevel optimization.
  2. Improve the efficiency (days)
  3. good transferable, trained with CIFAR-10 has good performance at ImageNet.

Differentiable Architecture Search

Search Space

Search computation cells as building blocks of final architecture.

Each cell is a DAG graph consisting of many nodes. Each node is a matrix/tensor. Each edge is associated with some operations.

Assume each cell has two input nodes and a single output node.

  1. The output of the cell is obtained by applying concatenation to all intermediate nodes.
  2. The intermediate node is computed with all predecessors.

CONTINUOUS RELAXATION Optimization

image-20220427221154302

The final problem is defined above

Approximate Solution

Run gradient decent together, each iteration update alpha and w together.

The step1 and step2 are updating together, use alpha and w from previous step.

image-20220427221114843

image-20220427221950638

Deriving discrete architecture

To form each node in the discrete architecture, we retain the top-k strongest operations (from distinct nodes) among all non-zero candidate operations collected from all the previous nodes. Strength is defined as softmax as shown above.

Experiments and Result

Convolutional cells for CIFAR-10

Each cell has 7 nodes. The first and second nodes of cell k are set equal to the outputs of cell k-2 and cell k-1, respectively,

Operations between nodes:

  • 3X3 and 5X5 separable convolutions
  • 3X3 and 5X5 dilated separable convolutions
  • 3X3 max pooling
  • 3X3 average pooling
  • zero.

​ Where it uses ReLU-Conv-BN order for convolutional operations

image-20220427222735713

ARCHITECTURE EVALUATION

image-20220427222829736





END OF POST




Tags Cloud


Categories Cloud




It's the niceties that make the difference fate gives us the hand, and we play the cards.