Extraction of Validating Shapes from very large Knowledge Graphs

Kashif Rabbani, Matteo Lissandrini, Katja Hose

Research output: Contribution to journalConference article in JournalResearchpeer-review

2 Citations (Scopus)
7 Downloads (Pure)

Abstract

Knowledge Graphs (KGs) represent heterogeneous domain knowledge on the Web and within organizations. There exist shapes constraint languages to define validating shapes to ensure the quality of the data in KGs. Existing techniques to extract validating shapes often fail to extract complete shapes, are not scalable, and are prone to produce spurious shapes. To address these shortcomings, we propose the Quality Shapes Extraction (QSE) approach to extract validating shapes in very large graphs, for which we devise both an exact and an approximate solution. QSE provides information about the reliability of shape constraints by computing their confidence and support within a KG and in doing so allows to identify shapes that are most informative and less likely to be affected by incomplete or incorrect data. To the best of our knowledge, QSE is the first approach to extract a complete set of validating shapes from WikiData. Moreover, QSE provides a 12x reduction in extraction time compared to existing approaches, while managing to filter out up to 93% of the invalid and spurious shapes, resulting in a reduction of up to 2 orders of magnitude in the number of constraints presented to the user, e.g., from 11,916 to 809 on DBpedia.

Original languageEnglish
JournalProceedings of the VLDB Endowment
Volume16
Issue number5
Pages (from-to)1023-1032
Number of pages10
ISSN2150-8097
DOIs
Publication statusPublished - 2023
Event49th International Conference on Very Large Data Bases, VLDB 2023 - Vancouver, Canada
Duration: 28 Aug 20231 Sept 2023

Conference

Conference49th International Conference on Very Large Data Bases, VLDB 2023
Country/TerritoryCanada
CityVancouver
Period28/08/202301/09/2023

Bibliographical note

Funding Information:
This research was partially funded by the Danish Council for Independent Research (DFF) under grant agreement no. DFF-804800051B, the EU’s H2020 research and innovation programme under grant agreement No 838216, and the Poul Due Jensen Foundation.

Publisher Copyright:
© 2023, VLDB Endowment. All rights reserved.

Fingerprint

Dive into the research topics of 'Extraction of Validating Shapes from very large Knowledge Graphs'. Together they form a unique fingerprint.

Cite this