NEURAL ARCHITECTURE SEARCH ON IMAGENET IN FOUR GPU HOURs A THEORETICALLY INSPIRED PERSPECTIVE

Posted on July 5, 2022   1 minute read ∼ Filed in  : 

The paper tries to show that the condition number of NTK and the number of linear regions can decouple and effectively characterize the trainability and expressivity of architectures respectively in complex NAS search spaces.

Method

Trainability: The trainability of a neural network indicates how effective it can be optimized using gradient descent

Expressivity: The expressivity of a neural network indicates how complex the function it can represent.

Trainability

As for the Trainability, the paper firstly uses NTK and some conclusions of other papers to show that trainability can be evaluated at the initialization stage.

image-20220707165606182

And then it use a spectrum of NTK to reform equation 1 and got:

image-20220707165913021

image-20220707165716082

Experiments show that K_N is negatively correlated with the architecture’s test accuracy.

Expressivity

As for the expressivity, the more linear regions, the better the expressive. For any network N with parameters theta, the paper repeats the measurement of the number of linear regions by sampling network parameters from the Kaiming Norm Initialization and calculating the average as the approximation to its expectation.

image-20220707170646408

Algorithm

Finally, the paper proposes a way to combine R and K together to score the best architecture.

image-20220707171101758

image-20220707172531402

Experiments

image-20220707171353530

In the beginning, Kn is decreased to increase the trainability, and the expressivity decreases a lit bit.

Then the paper measure the algorithm in both concrete and continuous search space.

For NAS-bench-201,

image-20220707172216855

Darts search space.

image-20220707172232113





END OF POST




Tags Cloud


Categories Cloud




It's the niceties that make the difference fate gives us the hand, and we play the cards.