NEURAL ARCHITECTURE SEARCH ON IMAGENET IN FOUR GPU HOURs A THEORETICALLY INSPIRED PERSPECTIVE
1 minute read ∼ Filed in : A paper noteThe paper tries to show that the condition number of NTK and the number of linear regions can decouple and effectively characterize the trainability and expressivity of architectures respectively in complex NAS search spaces.
Method
Trainability: The trainability of a neural network indicates how effective it can be optimized using gradient descent
Expressivity: The expressivity of a neural network indicates how complex the function it can represent.
Trainability
As for the Trainability, the paper firstly uses NTK and some conclusions of other papers to show that trainability can be evaluated at the initialization stage.
And then it use a spectrum of NTK to reform equation 1 and got:
Experiments show that K_N is negatively correlated with the architecture’s test accuracy.
Expressivity
As for the expressivity, the more linear regions, the better the expressive. For any network N with parameters theta, the paper repeats the measurement of the number of linear regions by sampling network parameters from the Kaiming Norm Initialization and calculating the average as the approximation to its expectation.
Algorithm
Finally, the paper proposes a way to combine R and K together to score the best architecture.
Experiments
In the beginning, Kn is decreased to increase the trainability, and the expressivity decreases a lit bit.
Then the paper measure the algorithm in both concrete and continuous search space.
For NAS-bench-201,
Darts search space.