ClusterEA: Scalable Entity Alignment with Stochastic Training and Normalized Mini-batch Similarities

Yunjun  Gao; Xiaoze  Liu; Junyang Wu; Tianyi Li; Pengfei  Wang; Lu Chen

ClusterEA: Scalable Entity Alignment with Stochastic Training and Normalized Mini-batch Similarities

Yunjun Gao, Xiaoze Liu, Junyang Wu, Tianyi Li, Pengfei Wang, Lu Chen

Research output: Contribution to book/anthology/report/conference proceeding › Article in proceeding › Research › peer-review

14 Citations (Scopus)

Abstract

Entity alignment (EA) aims at finding equivalent entities in different knowledge graphs (KGs). Embedding-based approaches have dominated the EA task in recent years. Those methods face problems that come from the geometric properties of embedding vectors, including hubness and isolation. To solve these geometric problems, many normalization approaches have been adopted for EA. However, the increasing scale of KGs renders it hard for EA models to adopt the normalization processes, thus limiting their usage in real-world applications. To tackle this challenge, we present ClusterEA, a general framework that is capable of scaling up EA models and enhancing their results by leveraging normalization methods on mini-batches with a high entity equivalent rate. ClusterEA contains three components to align entities between large-scale KGs, including stochastic training, ClusterSampler, and SparseFusion. It first trains a large-scale Siamese GNN for EA in a stochastic fashion to produce entity embeddings. Based on the embeddings, a novel ClusterSampler strategy is proposed for sampling highly overlapped mini-batches. Finally, ClusterEA incorporates SparseFusion, which normalizes local and global similarity and then fuses all similarity matrices to obtain the final similarity matrix. Extensive experiments with real-life datasets on EA benchmarks offer insight into the proposed framework, and suggest that it is capable of outperforming the state-of-the-art scalable EA framework by up to 8 times in terms of𝐻𝑖𝑡𝑠@1.

Original language	English
Title of host publication	KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
Number of pages	11
Publisher	Association for Computing Machinery
Publication date	2022
Pages	421-431
ISBN (Electronic)	9781450393850
Publication status	Published - 2022
Event	28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2022 - Washington, United States Duration: 14 Aug 2022 → 18 Aug 2022

Conference

Conference	28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2022
Country/Territory	United States
City	Washington
Period	14/08/2022 → 18/08/2022
Sponsor	ACM SIGKDD, ACM SIGMOD

Access to Document

https://dl.acm.org/doi/10.1145/3534678.3539331

AUB Link

Search for the material in Aalborg University Library's search engine

Cite this

@inproceedings{7bf685986cc445fbb4d81d95ae21aa57,

title = "ClusterEA: Scalable Entity Alignment with Stochastic Training and Normalized Mini-batch Similarities",

abstract = "Entity alignment (EA) aims at finding equivalent entities in different knowledge graphs (KGs). Embedding-based approaches have dominated the EA task in recent years. Those methods face problems that come from the geometric properties of embedding vectors, including hubness and isolation. To solve these geometric problems, many normalization approaches have been adopted for EA. However, the increasing scale of KGs renders it hard for EA models to adopt the normalization processes, thus limiting their usage in real-world applications. To tackle this challenge, we present ClusterEA, a general framework that is capable of scaling up EA models and enhancing their results by leveraging normalization methods on mini-batches with a high entity equivalent rate. ClusterEA contains three components to align entities between large-scale KGs, including stochastic training, ClusterSampler, and SparseFusion. It first trains a large-scale Siamese GNN for EA in a stochastic fashion to produce entity embeddings. Based on the embeddings, a novel ClusterSampler strategy is proposed for sampling highly overlapped mini-batches. Finally, ClusterEA incorporates SparseFusion, which normalizes local and global similarity and then fuses all similarity matrices to obtain the final similarity matrix. Extensive experiments with real-life datasets on EA benchmarks offer insight into the proposed framework, and suggest that it is capable of outperforming the state-of-the-art scalable EA framework by up to 8 times in terms of퐻푖푡푠@1.",

author = "Yunjun Gao and Xiaoze Liu and Junyang Wu and Tianyi Li and Pengfei Wang and Lu Chen",

year = "2022",

language = "English",

pages = "421--431",

booktitle = "KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining",

publisher = "Association for Computing Machinery",

address = "United States",

note = "28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2022 ; Conference date: 14-08-2022 Through 18-08-2022",

}

Gao, Y, Liu, X, Wu, J, Li, T, Wang, P & Chen, L 2022, ClusterEA: Scalable Entity Alignment with Stochastic Training and Normalized Mini-batch Similarities. in KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, pp. 421-431, 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2022, Washington, United States, 14/08/2022. <https://dl.acm.org/doi/10.1145/3534678.3539331>

ClusterEA: Scalable Entity Alignment with Stochastic Training and Normalized Mini-batch Similarities. / Gao, Yunjun; Liu, Xiaoze ; Wu, Junyang et al.
KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, 2022. p. 421-431.

Research output: Contribution to book/anthology/report/conference proceeding › Article in proceeding › Research › peer-review

TY - GEN

T1 - ClusterEA: Scalable Entity Alignment with Stochastic Training and Normalized Mini-batch Similarities

AU - Gao, Yunjun

AU - Liu, Xiaoze

AU - Wu, Junyang

AU - Li, Tianyi

AU - Wang, Pengfei

AU - Chen, Lu

PY - 2022

Y1 - 2022

N2 - Entity alignment (EA) aims at finding equivalent entities in different knowledge graphs (KGs). Embedding-based approaches have dominated the EA task in recent years. Those methods face problems that come from the geometric properties of embedding vectors, including hubness and isolation. To solve these geometric problems, many normalization approaches have been adopted for EA. However, the increasing scale of KGs renders it hard for EA models to adopt the normalization processes, thus limiting their usage in real-world applications. To tackle this challenge, we present ClusterEA, a general framework that is capable of scaling up EA models and enhancing their results by leveraging normalization methods on mini-batches with a high entity equivalent rate. ClusterEA contains three components to align entities between large-scale KGs, including stochastic training, ClusterSampler, and SparseFusion. It first trains a large-scale Siamese GNN for EA in a stochastic fashion to produce entity embeddings. Based on the embeddings, a novel ClusterSampler strategy is proposed for sampling highly overlapped mini-batches. Finally, ClusterEA incorporates SparseFusion, which normalizes local and global similarity and then fuses all similarity matrices to obtain the final similarity matrix. Extensive experiments with real-life datasets on EA benchmarks offer insight into the proposed framework, and suggest that it is capable of outperforming the state-of-the-art scalable EA framework by up to 8 times in terms of퐻푖푡푠@1.

AB - Entity alignment (EA) aims at finding equivalent entities in different knowledge graphs (KGs). Embedding-based approaches have dominated the EA task in recent years. Those methods face problems that come from the geometric properties of embedding vectors, including hubness and isolation. To solve these geometric problems, many normalization approaches have been adopted for EA. However, the increasing scale of KGs renders it hard for EA models to adopt the normalization processes, thus limiting their usage in real-world applications. To tackle this challenge, we present ClusterEA, a general framework that is capable of scaling up EA models and enhancing their results by leveraging normalization methods on mini-batches with a high entity equivalent rate. ClusterEA contains three components to align entities between large-scale KGs, including stochastic training, ClusterSampler, and SparseFusion. It first trains a large-scale Siamese GNN for EA in a stochastic fashion to produce entity embeddings. Based on the embeddings, a novel ClusterSampler strategy is proposed for sampling highly overlapped mini-batches. Finally, ClusterEA incorporates SparseFusion, which normalizes local and global similarity and then fuses all similarity matrices to obtain the final similarity matrix. Extensive experiments with real-life datasets on EA benchmarks offer insight into the proposed framework, and suggest that it is capable of outperforming the state-of-the-art scalable EA framework by up to 8 times in terms of퐻푖푡푠@1.

M3 - Article in proceeding

SP - 421

EP - 431

BT - KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

PB - Association for Computing Machinery

T2 - 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2022

Y2 - 14 August 2022 through 18 August 2022

ER -

ClusterEA: Scalable Entity Alignment with Stochastic Training and Normalized Mini-batch Similarities

Abstract

Conference

Access to Document

AUB Link

Fingerprint

Cite this