PaSca a Graph Neural Architecture Search System under the Scalable Paradigm

Posted on May 8, 2022 2 minute read ∼ Filed in : A paper note

Introduction

Motivation

GNN cannot scale well to data size and message passing steps. The exponential growth of neighborhood size leads to exponential IO overhead (a major challenge in large-scale GNN.)
Some work tries to train GNN in a distributed way, but the aggregation procedure bottlenecked the speed.
There is no general design space for GNN. And the exploration of search space is extensive.

Contribution

The paper proposes the first paradigm and system

Introduce scalable graph neural architecture paradigm with some abstractions
- graph_aggregator: captures the structural information via graph aggregation operations.
- message_aggregator combines different levels of structural information.
- message_updater generates the prediction based on the multi-scale features.
With those abstractions, the system can define general design space, and decouple sampling and training.
Propose a general design space consisting of 6 design dimensions, including 150k possible designs of scalable GNN.

And the space has adaptive aggregation and a complementary post-processing stage
Propose a search system to search a GNN.
- Suggestion engine (multi-objective search algorithm)
- Evaluation engine in a distributed manner

Abstraction

The paper divides the GNN training process into 3 stages. And each stage has many optional operations, which define the overall search space.

Many existing GNN models can be generalized from the defined search space.

Engines

Experiments

Setting

Datasets:

citation networks (Citeseer, Cora, and PubMed)
two social networks (Flickr and Reddit),
co-authorship graphs (Amazon and Coauthor)
co-purchasing network (ogbn-products)
one short-form video recommendation graph (Industry)

Baselines: compare with GCN, GAT, JK-Net, Res-GCN, APPNP, AP-GCN, SGC, SIGN, S2GC and GBP

Searched Representatives

We apply the multi-objective optimization targeting at classification error and inference time on Cora.

Training scalability

choose PaSca-APPNP as a representative and compare it with GraphSAGE

Train both of them with

batch size is 8192 for Reddit and 16384 for ogbn-product
in stand-alone and distributed scenarios and then measure their corresponding speedups.
speedup is calculated by runtime per epoch ( one worker in the stand-alone scenario and two workers in the distributed scenario )
WIthout cost, expectation is linear increase. ( since it’s async dist train )

GraphSage requires aggregating the ndoes during training, and it meets I/O bottleneck.

Performance-Efficiency Analysis

PaSca-V3 achieves the best performance with 4°ø training time compared with GBP and PaSca-V1. Note that, though PaSca-V1 requires the same training time as GBP, its inference time is less than GBP

So we can choose PaSca-V1 to V3, along with GBP, according to different requirements of predictive performance, training efficiency, and inference time.

Model Scability

It includes adaptive message_aggregator and the adaptive message_aggregator can identify the different message-passing demands of nodes and explicitly weight each graph message.

PaSca a Graph Neural Architecture Search System under the Scalable Paradigm

Introduction

Motivation

Contribution

Abstraction

Engines

Experiments

Setting

Searched Representatives

Training scalability

Performance-Efficiency Analysis

Model Scability

Tags Cloud

Categories Cloud

It's the niceties that make the difference fate gives us the hand, and we play the cards.

Introduction

Motivation

Contribution

Abstraction

Engines

Experiments

Setting

Searched Representatives

Training scalability

Performance-Efficiency Analysis

Model Scability

END OF POST

Tags Cloud

Categories Cloud

It's the niceties that make the difference fate gives us the hand, and we play the cards.