Optimizing SPARQL queries using shape statistics

Kashif Rabbani, Matteo Lissandrini, Katja Hose

Research output: Contribution to book/anthology/report/conference proceedingArticle in proceedingResearchpeer-review

11 Citations (Scopus)
184 Downloads (Pure)

Abstract

With the growing popularity of storing data in native RDF, we witness more and more diverse use cases with complex SPARQL queries. As a consequence, query optimization - and in particular cardinality estimation and join ordering - becomes even more crucial. Classical methods exploit global statistics covering the entire RDF graph as a whole, which naturally fails to correctly capture correlations that are very common in RDF datasets, which then leads to erroneous cardinality estimations and suboptimal query execution plans. The alternative of trying to capture correlations in a fine-granular manner, on the other hand, results in very costly preprocessing steps to create these statistics. Hence, in this paper we propose shapes statistics, which extend the recent SHACL standard with statistic information to capture the correlation between classes and properties. Our extensive experiments on synthetic and real data show that shapes statistics can be generated and managed with only little overhead without disadvantages in query runtime while leading to noticeable improvements in cardinality estimation.

Original languageEnglish
Title of host publicationAdvances in Database Technology : 24th International Conference on Extending Database Technology, EDBT 2021
EditorsYannis Velegrakis, Yannis Velegrakis, Demetris Zeinalipour, Panos K. Chrysanthis, Panos K. Chrysanthis, Francesco Guerra
Number of pages6
PublisherOpenProceedings.org
Publication date2021
Pages505-510
ISBN (Electronic)978-3-89318-084-4
DOIs
Publication statusPublished - 2021
EventAdvances in Database Technology - 24th International Conference on Extending Database Technology, EDBT 2021 - Virtual, Nicosia, Cyprus
Duration: 23 Mar 202126 Mar 2021

Conference

ConferenceAdvances in Database Technology - 24th International Conference on Extending Database Technology, EDBT 2021
Country/TerritoryCyprus
CityVirtual, Nicosia
Period23/03/202126/03/2021
SponsorOracle, Snowflake, ZOOM, Zoom Video Communications, Inc.
SeriesAdvances in Database Technology
ISSN2367-2005

Bibliographical note

Funding Information:
Acknowledgments. This research was partially funded by the Danish Council for Independent Research (DFF) under grant agreement no. DFF-8048-00051B, the EU’s H2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 838216, and the Poul Due Jensen Foundation.

Publisher Copyright:
© 2021 Copyright held by the owner/author(s).

Fingerprint

Dive into the research topics of 'Optimizing SPARQL queries using shape statistics'. Together they form a unique fingerprint.

Cite this