DARTS DIFFERENTIABLE ARCHITECTURE SEARCH

Posted on April 27, 2022 2 minute read ∼ Filed in : A paper note

DARTS: DIFFERENTIABLE ARCHITECTURE SEARCH

Introduction

Current problems

The architecture search algorithm is computationally demanding.

RL for NAS needs 2000 GPU days.
Even with optimizations like weight prediction, performance prediction, weight sharing,
The main reason for this is all searching methods (RL, Bo, etc) treated NAS as a block box optimization over a discrete domain

Contributions

It treats NAS from different angles. Instead of searching over a discrete set of candidate architectures, we relax the search space to be continuous, so that the architecture can be optimized with respect to its validation set performance by gradient descent.

introduce a new algorithm for differentiable NAS based on bilevel optimization.
Improve the efficiency (days)
good transferable, trained with CIFAR-10 has good performance at ImageNet.

Differentiable Architecture Search

Search Space

Search computation cells as building blocks of final architecture.

Each cell is a DAG graph consisting of many nodes. Each node is a matrix/tensor. Each edge is associated with some operations.

Assume each cell has two input nodes and a single output node.

The output of the cell is obtained by applying concatenation to all intermediate nodes.
The intermediate node is computed with all predecessors.

CONTINUOUS RELAXATION Optimization

The final problem is defined above

Approximate Solution

Run gradient decent together, each iteration update alpha and w together.

The step1 and step2 are updating together, use alpha and w from previous step.

Deriving discrete architecture

To form each node in the discrete architecture, we retain the top-k strongest operations (from distinct nodes) among all non-zero candidate operations collected from all the previous nodes. Strength is defined as softmax as shown above.

Experiments and Result

Architecture search

Convolutional cells for CIFAR-10

Each cell has 7 nodes. The first and second nodes of cell k are set equal to the outputs of cell k-2 and cell k-1, respectively,

Operations between nodes:

3X3 and 5X5 separable convolutions
3X3 and 5X5 dilated separable convolutions
3X3 max pooling
3X3 average pooling
zero.

Where it uses ReLU-Conv-BN order for convolutional operations

DARTS DIFFERENTIABLE ARCHITECTURE SEARCH

Introduction

Current problems

Contributions

Differentiable Architecture Search

Search Space

CONTINUOUS RELAXATION Optimization

Approximate Solution

Deriving discrete architecture

Experiments and Result

Architecture search

Convolutional cells for CIFAR-10

ARCHITECTURE EVALUATION

Tags Cloud

Categories Cloud

It's the niceties that make the difference fate gives us the hand, and we play the cards.

Introduction

Current problems

Contributions

Differentiable Architecture Search

Search Space

CONTINUOUS RELAXATION Optimization

Approximate Solution

Deriving discrete architecture

Experiments and Result

Architecture search

Convolutional cells for CIFAR-10

ARCHITECTURE EVALUATION

END OF POST

Tags Cloud

Categories Cloud

It's the niceties that make the difference fate gives us the hand, and we play the cards.