Towards distributed node similarity search on graphs

Tianming Zhang; Yunjun Gao; Baihua Zheng; Lu Chen; Shiting Wen; Wei Guo

doi:10.1007/s11280-020-00819-6

Towards distributed node similarity search on graphs

Tianming Zhang, Yunjun Gao^*, Baihua Zheng, Lu Chen, Shiting Wen, Wei Guo

^*Kontaktforfatter

Publikation: Bidrag til tidsskrift › Tidsskriftartikel › Forskning › peer review

5 Citationer (Scopus)

148 Downloads (Pure)

Abstract

Node similarity search on graphs has wide applications in recommendation, link prediction, to name just a few. However, existing studies are insufficient due to two reasons: (i) the scale of the real-world graph is growing rapidly, and (ii) vertices are always associated with complex attributes. In this paper, we propose an efficiently distributed framework to support node similarity search on massive graphs, which considers both graph structure correlation and node attribute similarity in metric spaces. The framework consists of preprocessing stage and query stage. In the preprocessing stage, a parallel KD-tree construction (KDC) algorithm is developed to form a newly defined graph so-called hybrid graph, in order to integrate node attribute similarity into the original graph. To equally divide graph vertices into subsets, KDC adopts the KD-tree partitioning after the pivot mapping. In addition, two metric pruning rules and an optimized allocation strategy are presented to reduce communication and computation costs. In the query stage, based on the formed hybrid graph, we develop similarity search methods using random walk with restart (RWR) to measure node similarity. To boost efficiency, we derive tight bounds to rapidly shrink the search region. Extensive experiments with three real massive graphs are conducted to verify the effectiveness, efficiency, and scalability of our proposed techniques.

Originalsprog	Engelsk
Tidsskrift	World Wide Web
Vol/bind	23
Udgave nummer	6
Sider (fra-til)	3025-3053
Antal sider	29
ISSN	1386-145X
DOI	https://doi.org/10.1007/s11280-020-00819-6
Status	Udgivet - 1 nov. 2020

Bibliografisk note

Publisher Copyright:
© 2020, Springer Science+Business Media, LLC, part of Springer Nature.

Adgang til dokumentet

10.1007/s11280-020-00819-6

Accepted author manuscriptAccepteret manuskript, 2,33 MBLicens: CC BY-NC-ND 4.0

AUB Link

Søg efter materialet i Aalborg Universitetsbiblioteks søgemaskine

Andre filer og links

Link to publication in Scopus

Citationsformater

@article{1795e471a97049ab91057545a4e5a75c,

title = "Towards distributed node similarity search on graphs",

abstract = "Node similarity search on graphs has wide applications in recommendation, link prediction, to name just a few. However, existing studies are insufficient due to two reasons: (i) the scale of the real-world graph is growing rapidly, and (ii) vertices are always associated with complex attributes. In this paper, we propose an efficiently distributed framework to support node similarity search on massive graphs, which considers both graph structure correlation and node attribute similarity in metric spaces. The framework consists of preprocessing stage and query stage. In the preprocessing stage, a parallel KD-tree construction (KDC) algorithm is developed to form a newly defined graph so-called hybrid graph, in order to integrate node attribute similarity into the original graph. To equally divide graph vertices into subsets, KDC adopts the KD-tree partitioning after the pivot mapping. In addition, two metric pruning rules and an optimized allocation strategy are presented to reduce communication and computation costs. In the query stage, based on the formed hybrid graph, we develop similarity search methods using random walk with restart (RWR) to measure node similarity. To boost efficiency, we derive tight bounds to rapidly shrink the search region. Extensive experiments with three real massive graphs are conducted to verify the effectiveness, efficiency, and scalability of our proposed techniques.",

keywords = "Algorithm, Distributed processing, Graph, Node similarity search",

author = "Tianming Zhang and Yunjun Gao and Baihua Zheng and Lu Chen and Shiting Wen and Wei Guo",

note = "Funding Information: This work was supported in part by the National Key R&D Program of China under Grant No. 2018YFB1004003, the NSFC under Grants No. 61972338 and 61802344, the NSFC-Zhejiang Joint Fund under Grant No. U1609217, and the ZJU-Hikvision Joint Project. Yunjun Gao is the corresponding author of the work. Publisher Copyright: {\textcopyright} 2020, Springer Science+Business Media, LLC, part of Springer Nature.",

year = "2020",

month = nov,

day = "1",

doi = "10.1007/s11280-020-00819-6",

language = "English",

volume = "23",

pages = "3025--3053",

journal = "World Wide Web",

issn = "1386-145X",

publisher = "Springer Publishing Company",

number = "6",

}

TY - JOUR

T1 - Towards distributed node similarity search on graphs

AU - Zhang, Tianming

AU - Gao, Yunjun

AU - Zheng, Baihua

AU - Chen, Lu

AU - Wen, Shiting

AU - Guo, Wei

N1 - Funding Information: This work was supported in part by the National Key R&D Program of China under Grant No. 2018YFB1004003, the NSFC under Grants No. 61972338 and 61802344, the NSFC-Zhejiang Joint Fund under Grant No. U1609217, and the ZJU-Hikvision Joint Project. Yunjun Gao is the corresponding author of the work. Publisher Copyright: © 2020, Springer Science+Business Media, LLC, part of Springer Nature.

PY - 2020/11/1

Y1 - 2020/11/1

N2 - Node similarity search on graphs has wide applications in recommendation, link prediction, to name just a few. However, existing studies are insufficient due to two reasons: (i) the scale of the real-world graph is growing rapidly, and (ii) vertices are always associated with complex attributes. In this paper, we propose an efficiently distributed framework to support node similarity search on massive graphs, which considers both graph structure correlation and node attribute similarity in metric spaces. The framework consists of preprocessing stage and query stage. In the preprocessing stage, a parallel KD-tree construction (KDC) algorithm is developed to form a newly defined graph so-called hybrid graph, in order to integrate node attribute similarity into the original graph. To equally divide graph vertices into subsets, KDC adopts the KD-tree partitioning after the pivot mapping. In addition, two metric pruning rules and an optimized allocation strategy are presented to reduce communication and computation costs. In the query stage, based on the formed hybrid graph, we develop similarity search methods using random walk with restart (RWR) to measure node similarity. To boost efficiency, we derive tight bounds to rapidly shrink the search region. Extensive experiments with three real massive graphs are conducted to verify the effectiveness, efficiency, and scalability of our proposed techniques.

AB - Node similarity search on graphs has wide applications in recommendation, link prediction, to name just a few. However, existing studies are insufficient due to two reasons: (i) the scale of the real-world graph is growing rapidly, and (ii) vertices are always associated with complex attributes. In this paper, we propose an efficiently distributed framework to support node similarity search on massive graphs, which considers both graph structure correlation and node attribute similarity in metric spaces. The framework consists of preprocessing stage and query stage. In the preprocessing stage, a parallel KD-tree construction (KDC) algorithm is developed to form a newly defined graph so-called hybrid graph, in order to integrate node attribute similarity into the original graph. To equally divide graph vertices into subsets, KDC adopts the KD-tree partitioning after the pivot mapping. In addition, two metric pruning rules and an optimized allocation strategy are presented to reduce communication and computation costs. In the query stage, based on the formed hybrid graph, we develop similarity search methods using random walk with restart (RWR) to measure node similarity. To boost efficiency, we derive tight bounds to rapidly shrink the search region. Extensive experiments with three real massive graphs are conducted to verify the effectiveness, efficiency, and scalability of our proposed techniques.

KW - Algorithm

KW - Distributed processing

KW - Graph

KW - Node similarity search

UR - http://www.scopus.com/inward/record.url?scp=85086665268&partnerID=8YFLogxK

U2 - 10.1007/s11280-020-00819-6

DO - 10.1007/s11280-020-00819-6

M3 - Journal article

AN - SCOPUS:85086665268

SN - 1386-145X

VL - 23

SP - 3025

EP - 3053

JO - World Wide Web

JF - World Wide Web

IS - 6

ER -

Towards distributed node similarity search on graphs

Abstract

Bibliografisk note

Adgang til dokumentet

AUB Link

Andre filer og links

Fingeraftryk

Citationsformater