PaSca a Graph Neural Architecture Search System under the Scalable Paradigm

Posted on May 8, 2022   2 minute read ∼ Filed in  : 

image-20220509214808794

Introduction

Motivation

  1. GNN cannot scale well to data size and message passing steps. The exponential growth of neighborhood size leads to exponential IO overhead (a major challenge in large-scale GNN.)

  2. Some work tries to train GNN in a distributed way, but the aggregation procedure bottlenecked the speed.

    image-20220510171759320

  3. There is no general design space for GNN. And the exploration of search space is extensive.

Contribution

The paper proposes the first paradigm and system

  1. Introduce scalable graph neural architecture paradigm with some abstractions

    • graph_aggregator: captures the structural information via graph aggregation operations.
    • message_aggregator combines different levels of structural information.
    • message_updater generates the prediction based on the multi-scale features.

    With those abstractions, the system can define general design space, and decouple sampling and training.

  2. Propose a general design space consisting of 6 design dimensions, including 150k possible designs of scalable GNN.

    And the space has adaptive aggregation and a complementary post-processing stage

  3. Propose a search system to search a GNN.

    • Suggestion engine (multi-objective search algorithm)
    • Evaluation engine in a distributed manner

Abstraction

The paper divides the GNN training process into 3 stages. And each stage has many optional operations, which define the overall search space.

Many existing GNN models can be generalized from the defined search space.

image-20220510215212171

Engines

image-20220511114307524

Experiments

Setting

Datasets:

  • citation networks (Citeseer, Cora, and PubMed)
  • two social networks (Flickr and Reddit),
  • co-authorship graphs (Amazon and Coauthor)
  • co-purchasing network (ogbn-products)
  • one short-form video recommendation graph (Industry)

Baselines: compare with GCN, GAT, JK-Net, Res-GCN, APPNP, AP-GCN, SGC, SIGN, S2GC and GBP

Searched Representatives

We apply the multi-objective optimization targeting at classification error and inference time on Cora.

image-20220510222846980

Training scalability

choose PaSca-APPNP as a representative and compare it with GraphSAGE

Train both of them with

  1. batch size is 8192 for Reddit and 16384 for ogbn-product
  2. in stand-alone and distributed scenarios and then measure their corresponding speedups.
  3. speedup is calculated by runtime per epoch ( one worker in the stand-alone scenario and two workers in the distributed scenario )
  4. WIthout cost, expectation is linear increase. ( since it’s async dist train )

image-20220510223238526

GraphSage requires aggregating the ndoes during training, and it meets I/O bottleneck.

Performance-Efficiency Analysis

PaSca-V3 achieves the best performance with 4°ø training time compared with GBP and PaSca-V1. Note that, though PaSca-V1 requires the same training time as GBP, its inference time is less than GBP

So we can choose PaSca-V1 to V3, along with GBP, according to different requirements of predictive performance, training efficiency, and inference time.

image-20220510224228052

image-20220510224153663

Model Scability

It includes adaptive message_aggregator and the adaptive message_aggregator can identify the different message-passing demands of nodes and explicitly weight each graph message.





END OF POST




Tags Cloud


Categories Cloud




It's the niceties that make the difference fate gives us the hand, and we play the cards.