ADGNN: Towards Scalable GNN Training with Aggregation-Difference Aware Sampling

Zhen Song, Yu Gu, Tianyi Li, Qing Sun, Yanfeng Zhang, Christian S. Jensen, Ge Yu

Publikation: Bidrag til tidsskriftKonferenceartikel i tidsskriftForskningpeer review


Distributed computing is promising to enable large-scale graph neural network (GNN) model training. However, care is needed to avoid excessive computational and communication overheads. Sampling is promising in terms of enabling scalability, and sampling techniques have been proposed to reduce training costs. However, online sampling introduces large overheads, and while offline sampling that is done only once can eliminate such overheads, it instead introduces information loss and accuracy degradation. Thus, existing sampling techniques are unable to improve simultaneously both efficiency and accuracy, particularly at low sampling rates. We develop a distributed system, ADGNN, for full-batch based GNN training that adopts a hybrid sampling architecture to enable a trade-off between efficiency and accuracy. Specifically, ADGNN employs sampling result reuse techniques to reduce the cost associated with sampling and thus improve training efficiency. To alleviate accuracy degradation, we introduce a new metric,Aggregation Difference (AD), that quantifies the gap between sampled and full neighbor set aggregation. We present so-called AD-Sampling that aims to minimize the Aggregation Difference with an adaptive sampling frequency tuner. Finally, ADGNN employs anAD -importance-based sampling technique for remote neighbors to further reduce communication costs. Experiments on five real datasets show that ADGNN is able to outperform the state-of-the-art by up to nearly 9 times in terms of efficiency, while achieving comparable accuracy to the non-sampling methods.
TidsskriftProceedings of the ACM on Management of Data
Udgave nummer4
Sider (fra-til)1-26
StatusUdgivet - 2023

Bibliografisk note

DBLP's bibliographic metadata records provided through are distributed under a Creative Commons CC0 1.0 Universal Public Domain Dedication. Although the bibliographic metadata records are provided consistent with CC0 1.0 Dedication, the content described by the metadata records is not. Content may be subject to copyright, rights of privacy, rights of publicity and other restrictions.


Dyk ned i forskningsemnerne om 'ADGNN: Towards Scalable GNN Training with Aggregation-Difference Aware Sampling'. Sammen danner de et unikt fingeraftryk.