Distributed Graph Embedding with Information-Oriented Random Walks

Peng Fang, Arijit Khan, Siqiang Luo, Fang Wang, Dan Feng, Zhenli Li, Wei Yin, Yuchao Cao

Research output: Contribution to journalConference article in JournalResearchpeer-review

8 Citations (Scopus)
35 Downloads (Pure)

Abstract

Graph embedding maps graph nodes to low-dimensional vectors, and is widely adopted in machine learning tasks. The increasing availability of billion-edge graphs underscores the importance of learning efficient and effective embeddings on large graphs, such as link prediction on Twitter with over one billion edges. Most existing graph embedding methods fall short of reaching high data scalability. In this paper, we present a general-purpose, distributed, information-centric random walk-based graph embedding framework, DistGER, which can scale to embed billion-edge graphs. DistGER incrementally computes information-centric random walks. It further leverages a multi-proximity-aware, streaming, parallel graph partitioning strategy, simultaneously achieving high local partition quality and excellent workload balancing across machines. DistGER also improves the distributed Skip-Gram learning model to generate node embeddings by optimizing the access locality, CPU throughput, and synchronization efficiency. Experiments on real-world graphs demonstrate that compared to state-of-the-art distributed graph embedding frameworks, including KnightKing, DistDGL, and PytorchBigGraph, DistGER exhibits 2.33×–129× acceleration, 45% reduction in cross-machinescommunication, and>10% effectiveness improvement in downstream tasks.
Original languageEnglish
JournalProceedings of the VLDB Endowment
Volume16
Issue number7
Pages (from-to)1643-1656
Number of pages14
ISSN2150-8097
DOIs
Publication statusPublished - 2023
Event49th International Conference on Very Large Data Bases, VLDB 2023 - Vancouver, Canada
Duration: 28 Aug 20231 Sept 2023

Conference

Conference49th International Conference on Very Large Data Bases, VLDB 2023
Country/TerritoryCanada
CityVancouver
Period28/08/202301/09/2023

Fingerprint

Dive into the research topics of 'Distributed Graph Embedding with Information-Oriented Random Walks'. Together they form a unique fingerprint.

Cite this