Efficient and Incremental Clustering Algorithms on Star-Schema Heterogeneous Graphs

Lu Chen; Yunjun Gao; Yuanliang Zhang; Christian S. Jensen; Bolong Zheng

doi:10.1109/ICDE.2019.00031

Efficient and Incremental Clustering Algorithms on Star-Schema Heterogeneous Graphs

Lu Chen, Yunjun Gao, Yuanliang Zhang, Christian S. Jensen, Bolong Zheng

Research output: Contribution to book/anthology/report/conference proceeding › Article in proceeding › Research › peer-review

14 Citations (Scopus)

Abstract

Many datasets including social media data and bibliographic data can be modeled as graphs. Clustering such graphs is able to provide useful insights into the structure of the data. To improve the quality of clustering, node attributes can be taken into account, resulting in attributed graphs. Existing attributed graph clustering methods generally consider attribute similarity and structural similarity separately. In this paper, we represent attributed graphs as star-schema heterogeneous graphs, where attributes are modeled as different types of graph nodes. This enables the use of personalized pagerank (PPR) as a unified distance measure that captures both structural and attribute similarity. We employ DBSCAN for clustering, and we update edge weights iteratively to balance the importance of different attributes. To improve the efficiency of the clustering, we develop two incremental approaches that aim to enable efficient PPR score computation when edge weights are updated. To boost the effectiveness of the clustering, we propose a simple yet effective edge weight update strategy based on entropy. In addition, we present a game theory based method that enables trading efficiency for result quality. Extensive experiments on real-life datasets offer insight into the effectiveness and efficiency of our proposals, compared with existing methods.

Original language	English
Title of host publication	Proceedings - 2019 IEEE 35th International Conference on Data Engineering, ICDE 2019
Number of pages	12
Publisher	IEEE
Publication date	2019
Pages	256-267
Article number	8731611
ISBN (Print)	978-1-5386-7475-8
ISBN (Electronic)	978-1-5386-7474-1
DOIs	https://doi.org/10.1109/ICDE.2019.00031
Publication status	Published - 2019
Event	The 35th IEEE International Conference on Data Engineering (ICDE) - Macau, Macau, China Duration: 8 Apr 2019 → 12 Apr 2019

Conference

Conference	The 35th IEEE International Conference on Data Engineering (ICDE)
Location	Macau
Country/Territory	China
City	Macau
Period	08/04/2019 → 12/04/2019

Series	Proceedings of the International Conference on Data Engineering
ISSN	1063-6382

Keywords

Algorithm
Graph clustering
Graph mining
Heterogeneous graph

Access to Document

10.1109/ICDE.2019.00031

AUB Link

Search for the material in Aalborg University Library's search engine

Cite this

@inproceedings{55b896c7a06a4249921fcc9a114c8712,

title = "Efficient and Incremental Clustering Algorithms on Star-Schema Heterogeneous Graphs",

abstract = "Many datasets including social media data and bibliographic data can be modeled as graphs. Clustering such graphs is able to provide useful insights into the structure of the data. To improve the quality of clustering, node attributes can be taken into account, resulting in attributed graphs. Existing attributed graph clustering methods generally consider attribute similarity and structural similarity separately. In this paper, we represent attributed graphs as star-schema heterogeneous graphs, where attributes are modeled as different types of graph nodes. This enables the use of personalized pagerank (PPR) as a unified distance measure that captures both structural and attribute similarity. We employ DBSCAN for clustering, and we update edge weights iteratively to balance the importance of different attributes. To improve the efficiency of the clustering, we develop two incremental approaches that aim to enable efficient PPR score computation when edge weights are updated. To boost the effectiveness of the clustering, we propose a simple yet effective edge weight update strategy based on entropy. In addition, we present a game theory based method that enables trading efficiency for result quality. Extensive experiments on real-life datasets offer insight into the effectiveness and efficiency of our proposals, compared with existing methods.",

keywords = "Algorithm, Graph clustering, Graph mining, Heterogeneous graph",

author = "Lu Chen and Yunjun Gao and Yuanliang Zhang and Jensen, {Christian S.} and Bolong Zheng",

year = "2019",

doi = "10.1109/ICDE.2019.00031",

language = "English",

isbn = "978-1-5386-7475-8",

series = "Proceedings of the International Conference on Data Engineering",

publisher = "IEEE",

pages = "256--267",

booktitle = "Proceedings - 2019 IEEE 35th International Conference on Data Engineering, ICDE 2019",

address = "United States",

note = "The 35th IEEE International Conference on Data Engineering (ICDE), ICDE 2019 ; Conference date: 08-04-2019 Through 12-04-2019",

}

Chen, L, Gao, Y, Zhang, Y, Jensen, CS & Zheng, B 2019, Efficient and Incremental Clustering Algorithms on Star-Schema Heterogeneous Graphs. in Proceedings - 2019 IEEE 35th International Conference on Data Engineering, ICDE 2019., 8731611, IEEE, Proceedings of the International Conference on Data Engineering, pp. 256-267, The 35th IEEE International Conference on Data Engineering (ICDE), Macau, China, 08/04/2019. https://doi.org/10.1109/ICDE.2019.00031

Efficient and Incremental Clustering Algorithms on Star-Schema Heterogeneous Graphs. / Chen, Lu; Gao, Yunjun; Zhang, Yuanliang et al.
Proceedings - 2019 IEEE 35th International Conference on Data Engineering, ICDE 2019. IEEE, 2019. p. 256-267 8731611 (Proceedings of the International Conference on Data Engineering).

Research output: Contribution to book/anthology/report/conference proceeding › Article in proceeding › Research › peer-review

TY - GEN

T1 - Efficient and Incremental Clustering Algorithms on Star-Schema Heterogeneous Graphs

AU - Chen, Lu

AU - Gao, Yunjun

AU - Zhang, Yuanliang

AU - Jensen, Christian S.

AU - Zheng, Bolong

PY - 2019

Y1 - 2019

N2 - Many datasets including social media data and bibliographic data can be modeled as graphs. Clustering such graphs is able to provide useful insights into the structure of the data. To improve the quality of clustering, node attributes can be taken into account, resulting in attributed graphs. Existing attributed graph clustering methods generally consider attribute similarity and structural similarity separately. In this paper, we represent attributed graphs as star-schema heterogeneous graphs, where attributes are modeled as different types of graph nodes. This enables the use of personalized pagerank (PPR) as a unified distance measure that captures both structural and attribute similarity. We employ DBSCAN for clustering, and we update edge weights iteratively to balance the importance of different attributes. To improve the efficiency of the clustering, we develop two incremental approaches that aim to enable efficient PPR score computation when edge weights are updated. To boost the effectiveness of the clustering, we propose a simple yet effective edge weight update strategy based on entropy. In addition, we present a game theory based method that enables trading efficiency for result quality. Extensive experiments on real-life datasets offer insight into the effectiveness and efficiency of our proposals, compared with existing methods.

AB - Many datasets including social media data and bibliographic data can be modeled as graphs. Clustering such graphs is able to provide useful insights into the structure of the data. To improve the quality of clustering, node attributes can be taken into account, resulting in attributed graphs. Existing attributed graph clustering methods generally consider attribute similarity and structural similarity separately. In this paper, we represent attributed graphs as star-schema heterogeneous graphs, where attributes are modeled as different types of graph nodes. This enables the use of personalized pagerank (PPR) as a unified distance measure that captures both structural and attribute similarity. We employ DBSCAN for clustering, and we update edge weights iteratively to balance the importance of different attributes. To improve the efficiency of the clustering, we develop two incremental approaches that aim to enable efficient PPR score computation when edge weights are updated. To boost the effectiveness of the clustering, we propose a simple yet effective edge weight update strategy based on entropy. In addition, we present a game theory based method that enables trading efficiency for result quality. Extensive experiments on real-life datasets offer insight into the effectiveness and efficiency of our proposals, compared with existing methods.

KW - Algorithm

KW - Graph clustering

KW - Graph mining

KW - Heterogeneous graph

U2 - 10.1109/ICDE.2019.00031

DO - 10.1109/ICDE.2019.00031

M3 - Article in proceeding

SN - 978-1-5386-7475-8

T3 - Proceedings of the International Conference on Data Engineering

SP - 256

EP - 267

BT - Proceedings - 2019 IEEE 35th International Conference on Data Engineering, ICDE 2019

PB - IEEE

T2 - The 35th IEEE International Conference on Data Engineering (ICDE)

Y2 - 8 April 2019 through 12 April 2019

ER -

Efficient and Incremental Clustering Algorithms on Star-Schema Heterogeneous Graphs

Abstract

Conference

Keywords

Access to Document

AUB Link

Fingerprint

Cite this