Abstract
Dynamic Graph Neural Networks (DGNNs) have demonstrated exceptional performance at dynamic-graph analysis tasks. However, the costs exceed those incurred by other learning tasks, to the point where deployment on large-scale dynamic graphs is in feasible. Existing distributed frame works that facilitate DGNN training are in their early stages and experience challenges such as communication bottlenecks, imbalanced workloads, and GPU memory overflow. We introduce DynaHB, a distributed framework for DGNN training using so-called Hybrid Batches. DynaHB reduces communication by means of vertex caching, and it ensures even data and workload distribution by means of load-aware vertex partitioning. DyanHB also features a novel hybrid-batch training mode that combines vertex-batch and snapshot-batch techniques, thereby reducing training time and GPU memory usage. Next, to further enhance the hybrid batch based approach, DynaHB integrates a reinforcement learning-based batch adjuster and a pipelined batch generator with a batch reservoir to reduce the cost of generating hybrid batches. Extensive experiments show that DynaHB is capable of up to a 93× and an average of 8.06× speedups over the state-of-the-art training framework.
Original language | English |
---|---|
Journal | Proceedings of the VLDB Endowment |
Volume | 17 |
Issue number | 11 |
Pages (from-to) | 3388-3401 |
Number of pages | 14 |
ISSN | 2150-8097 |
DOIs | |
Publication status | Published - 2024 |
Event | 50th International Conference on Very Large Data Bases, VLDB 2024 - Guangzhou, China Duration: 24 Aug 2024 → 29 Aug 2024 |
Conference
Conference | 50th International Conference on Very Large Data Bases, VLDB 2024 |
---|---|
Country/Territory | China |
City | Guangzhou |
Period | 24/08/2024 → 29/08/2024 |
Bibliographical note
Publisher Copyright:© 2024, VLDB Endowment. All rights reserved.