Computing how-provenance for sparql queries via query rewriting

Daniel Hernández, Luis Galárraga, Katja Hose

Research output: Contribution to journalConference article in JournalResearchpeer-review

9 Citations (Scopus)
106 Downloads (Pure)


Over the past few years, we have witnessed the emergence of large knowledge graphs built by extracting and combining information from multiple sources. This has propelled many advances in query processing over knowledge graphs, however the aspect of providing provenance explanations for query results has so far been mostly neglected. We therefore propose a novel method, SPARQLprov, based on query rewriting, to compute how-provenance polynomials for SPARQL queries over knowledge graphs. Contrary to existing works, SPARQLprov is system-agnostic and can be applied to standard and already deployed SPARQL engines without the need of customized extensions. We rely on spm-semirings to compute polynomial annotations that respect the property of commutation with homomorphisms on monotonic and non-monotonic SPARQL queries without aggregate functions. Our evaluation on real and synthetic data shows that SPARQLprov over standard engines incurs an acceptable runtime overhead w.r.t. the original query, competing with state-of-the-art solutions for how-provenance computation.

Original languageEnglish
JournalProceedings of the VLDB Endowment
Issue number13
Pages (from-to)3389-3401
Number of pages13
Publication statusPublished - 2021
Event47th International Conference on Very Large Data Bases, VLDB 2021 - Virtual, Online
Duration: 16 Aug 202120 Aug 2021


Conference47th International Conference on Very Large Data Bases, VLDB 2021
CityVirtual, Online

Bibliographical note

Funding Information:
This research was partially funded by the Danish Council for Independent Research (DFF) under grant agreement no. DFF-8048-00051B and the Poul Due Jensen Foundation.

Publisher Copyright:
© 2021, VLDB Endowment. All rights reserved.


Dive into the research topics of 'Computing how-provenance for sparql queries via query rewriting'. Together they form a unique fingerprint.

Cite this