Graph Masked Autoencoder Enhanced Predictor for Neural Architecture SearchInference Serving
1 minute read ∼ Filed in : A paper noteMain Idea
Problem
It is challenging to train the predictor with few architecture evaluations for efficient NAS.
Most existing work uses something other than untrained(unlabelled) architecture, thus missing an opportunity for improvements.
Solution
The paper constructs an architecture performance predictor by
- Use many unlabeled data (untrained architectures) in search space to pre-train the model,
-
Encoder: Use GAT to take architecture graph as inputs, and then outputs a new graph
-
Decoder: Linear projection to decode the vertex features and predict the operation type.
-
Obj function: \(L = (1/N_{mv}) * \sum_{mv} \sum_{c=1}^{c}y_{ic} *log(ProbabilityOfC_{i})\) y is one if the real category of the vertex i is c, 0 otherwise.
mv is masked vertex
-
-
Then fine-tune the model with labeled datasets.
-
Ignore the decoder and only uses the encoder with pre-trained parameters and fully connected layers.
The tunning will update the fully connected layer’s weight only.
-
Obj functions:
The end-to-end tunning is to predict the ranking rather than performance.
-