Automatically Extracted SHACL Shapes for WikiData, DBpedia, YAGO-4, and LUBM & Associated Coverage Statistics

Dataset

Description

The uploaded datasets contain automatically extracted SHACL shapes for the following datasets:

WikiData (the truthy dump from September 2021 filtered by removing non-English strings) [1]DBpedia [2]YAGO-4 [3] LUBM (scale factor 500) [4]

The validating shapes for these datasets are generated by a program that parses the corresponding RDF files (in `.nt` format). The extracted shapes encode various SHACL constraints, e.g., sh:minCount, sh:path, sh:class, sh:datatype etc. For each shape we encode coverage in terms of number of entities satisfying such shape, this information is encoded using the void:entities predicate. 

We have provided as executable Jar file the program we developed to extract these SHACL shapes.
More details about the datasets used to extract these shapes and <em>how to run the Jar</em> are available on our GitHub repository https://github.com/Kashif-Rabbani/validatingshapes.

[1] Vrandečić, Denny, and Markus Krötzsch. "Wikidata: a free collaborative knowledgebase." Communications of the ACM 57.10 (2014): 78-85.

[2] Auer, Sören, et al. "Dbpedia: A nucleus for a web of open data." The semantic web. Springer, Berlin, Heidelberg, 2007. 722-735.

[3] Pellissier Tanon, Thomas, Gerhard Weikum, and Fabian Suchanek. "Yago 4: A reason-able knowledge base." European Semantic Web Conference. Springer, Cham, 2020.

[4] Guo, Yuanbo, Zhengxiang Pan, and Jeff Heflin. "LUBM: A benchmark for OWL knowledge base systems." Journal of Web Semantics 3.2-3 (2005): 158-182.
Date made available2022
PublisherZenodo

Cite this