Efficient Distributed Clustering Algorithms on Star-Schema Heterogeneous Graphs

Lu Chen, Yunjun Gao, Xingrui Huang, Christian S. Jensen, Bolong Zheng

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningpeer review

4 Citationer (Scopus)
109 Downloads (Pure)

Abstract

Clustering graphs is able to provide useful insights into the structure of the data. To improve the quality of clustering, node attributes can be considered, resulting in attributed graphs. Existing attributed graph clustering methods generally consider attribute similarity and structural similarity separately. In this paper, we represent attributed graphs as star-schema heterogeneous graphs, where attributes are modeled as different types of graph nodes. This enables the use of personalized pagerank (PPR) as a unified distance measure that captures both structural and attribute similarities. We employ DBSCAN for clustering, and update edge weights iteratively to balance the importance of different attributes. The rapidly growing volume of data nowadays challenges traditional clustering algorithms, and thus, a distributed method is required. Hence, we adopt a popular distributed graph computing system Blogel, based on which, we develop four exact and approximate approaches that enable efficient PPR score computation when edge weights are updated. To improve the effectiveness of the clustering, we propose a simple yet effective edge weight update strategy based on entropy. Also, we present a game theory based method that enables trading efficiency for result quality. Extensive experiments on real-life datasets demonstrate the effectiveness and efficiency of our proposals.
OriginalsprogEngelsk
TidsskriftIEEE Transactions on Knowledge and Data Engineering
Vol/bind34
Udgave nummer10
Sider (fra-til)4781-4796
Antal sider16
ISSN1041-4347
DOI
StatusUdgivet - okt. 2022

Bibliografisk note

Publisher Copyright:
IEEE

Fingeraftryk

Dyk ned i forskningsemnerne om 'Efficient Distributed Clustering Algorithms on Star-Schema Heterogeneous Graphs'. Sammen danner de et unikt fingeraftryk.

Citationsformater