DynaHB: A Communication-Avoiding Asynchronous Distributed Framework with Hybrid Batches for Dynamic GNN Training

Zhen Song, Yu Gu*, Qing Sun, Tianyi Li, Yanfeng Zhang, Yushuai Li, Christian S. Jensen, Ge Yu

*Corresponding author for this work

Research output: Contribution to journalConference article in JournalResearchpeer-review

Abstract

Dynamic Graph Neural Networks (DGNNs) have demonstrated exceptional performance at dynamic-graph analysis tasks. However, the costs exceed those incurred by other learning tasks, to the point where deployment on large-scale dynamic graphs is in feasible. Existing distributed frame works that facilitate DGNN training are in their early stages and experience challenges such as communication bottlenecks, imbalanced workloads, and GPU memory overflow. We introduce DynaHB, a distributed framework for DGNN training using so-called Hybrid Batches. DynaHB reduces communication by means of vertex caching, and it ensures even data and workload distribution by means of load-aware vertex partitioning. DyanHB also features a novel hybrid-batch training mode that combines vertex-batch and snapshot-batch techniques, thereby reducing training time and GPU memory usage. Next, to further enhance the hybrid batch based approach, DynaHB integrates a reinforcement learning-based batch adjuster and a pipelined batch generator with a batch reservoir to reduce the cost of generating hybrid batches. Extensive experiments show that DynaHB is capable of up to a 93× and an average of 8.06× speedups over the state-of-the-art training framework.

Original languageEnglish
JournalProceedings of the VLDB Endowment
Volume17
Issue number11
Pages (from-to)3388-3401
Number of pages14
ISSN2150-8097
DOIs
Publication statusPublished - 2024
Event50th International Conference on Very Large Data Bases, VLDB 2024 - Guangzhou, China
Duration: 24 Aug 202429 Aug 2024

Conference

Conference50th International Conference on Very Large Data Bases, VLDB 2024
Country/TerritoryChina
CityGuangzhou
Period24/08/202429/08/2024

Bibliographical note

Publisher Copyright:
© 2024, VLDB Endowment. All rights reserved.

Fingerprint

Dive into the research topics of 'DynaHB: A Communication-Avoiding Asynchronous Distributed Framework with Hybrid Batches for Dynamic GNN Training'. Together they form a unique fingerprint.

Cite this