Comparing data summaries for processing live queries over Linked Data

Jürgen Umbrich; Katja Hose; Marcel Karnstedt; Andreas Harth; Axel Polleres

doi:10.1007/s11280-010-0107-z

Comparing data summaries for processing live queries over Linked Data

Jürgen Umbrich, Katja Hose, Marcel Karnstedt, Andreas Harth^*, Axel Polleres

^*Corresponding author for this work

Research output: Contribution to journal › Journal article › Research › peer-review

63 Citations (Scopus)

Abstract

A growing amount of Linked Data-graph-structured data accessible at sources distributed across the Web-enables advanced data integration and decision-making applications. Typical systems operating on Linked Data collect (crawl) and pre-process (index) large amounts of data, and evaluate queries against a centralised repository. Given that crawling and indexing are time-consuming operations, the data in the centralised index may be out of date at query execution time. An ideal query answering system for querying Linked Data live should return current answers in a reasonable amount of time, even on corpora as large as the Web. In such a live query system source selection-determining which sources contribute answers to a query-is a crucial step. In this article we propose to use lightweight data summaries for determining relevant sources during query evaluation. We compare several data structures and hash functions with respect to their suitability for building such summaries, stressing benefits for queries that contain joins and require ranking of results and sources. We elaborate on join variants, join ordering and ranking. We analyse the different approaches theoretically and provide results of an extensive experimental evaluation.

Original language	English
Journal	World Wide Web
Volume	14
Issue number	5
Pages (from-to)	495-544
Number of pages	50
ISSN	1386-145X
DOIs	https://doi.org/10.1007/s11280-010-0107-z
Publication status	Published - 1 Oct 2011
Externally published	Yes

Keywords

index structures
Linked Data
RDF querying

Access to Document

10.1007/s11280-010-0107-z

AUB Link

Search for the material in Aalborg University Library's search engine

Cite this

@article{30788c26af4c4ec2a7b23eb40f013ebc,

title = "Comparing data summaries for processing live queries over Linked Data",

abstract = "A growing amount of Linked Data-graph-structured data accessible at sources distributed across the Web-enables advanced data integration and decision-making applications. Typical systems operating on Linked Data collect (crawl) and pre-process (index) large amounts of data, and evaluate queries against a centralised repository. Given that crawling and indexing are time-consuming operations, the data in the centralised index may be out of date at query execution time. An ideal query answering system for querying Linked Data live should return current answers in a reasonable amount of time, even on corpora as large as the Web. In such a live query system source selection-determining which sources contribute answers to a query-is a crucial step. In this article we propose to use lightweight data summaries for determining relevant sources during query evaluation. We compare several data structures and hash functions with respect to their suitability for building such summaries, stressing benefits for queries that contain joins and require ranking of results and sources. We elaborate on join variants, join ordering and ranking. We analyse the different approaches theoretically and provide results of an extensive experimental evaluation.",

keywords = "index structures, Linked Data, RDF querying",

author = "J{\"u}rgen Umbrich and Katja Hose and Marcel Karnstedt and Andreas Harth and Axel Polleres",

year = "2011",

month = oct,

day = "1",

doi = "10.1007/s11280-010-0107-z",

language = "English",

volume = "14",

pages = "495--544",

journal = "World Wide Web",

issn = "1386-145X",

publisher = "Springer Publishing Company",

number = "5",

}

TY - JOUR

T1 - Comparing data summaries for processing live queries over Linked Data

AU - Umbrich, Jürgen

AU - Hose, Katja

AU - Karnstedt, Marcel

AU - Harth, Andreas

AU - Polleres, Axel

PY - 2011/10/1

Y1 - 2011/10/1

N2 - A growing amount of Linked Data-graph-structured data accessible at sources distributed across the Web-enables advanced data integration and decision-making applications. Typical systems operating on Linked Data collect (crawl) and pre-process (index) large amounts of data, and evaluate queries against a centralised repository. Given that crawling and indexing are time-consuming operations, the data in the centralised index may be out of date at query execution time. An ideal query answering system for querying Linked Data live should return current answers in a reasonable amount of time, even on corpora as large as the Web. In such a live query system source selection-determining which sources contribute answers to a query-is a crucial step. In this article we propose to use lightweight data summaries for determining relevant sources during query evaluation. We compare several data structures and hash functions with respect to their suitability for building such summaries, stressing benefits for queries that contain joins and require ranking of results and sources. We elaborate on join variants, join ordering and ranking. We analyse the different approaches theoretically and provide results of an extensive experimental evaluation.

AB - A growing amount of Linked Data-graph-structured data accessible at sources distributed across the Web-enables advanced data integration and decision-making applications. Typical systems operating on Linked Data collect (crawl) and pre-process (index) large amounts of data, and evaluate queries against a centralised repository. Given that crawling and indexing are time-consuming operations, the data in the centralised index may be out of date at query execution time. An ideal query answering system for querying Linked Data live should return current answers in a reasonable amount of time, even on corpora as large as the Web. In such a live query system source selection-determining which sources contribute answers to a query-is a crucial step. In this article we propose to use lightweight data summaries for determining relevant sources during query evaluation. We compare several data structures and hash functions with respect to their suitability for building such summaries, stressing benefits for queries that contain joins and require ranking of results and sources. We elaborate on join variants, join ordering and ranking. We analyse the different approaches theoretically and provide results of an extensive experimental evaluation.

KW - index structures

KW - Linked Data

KW - RDF querying

UR - http://www.scopus.com/inward/record.url?scp=80052289616&partnerID=8YFLogxK

U2 - 10.1007/s11280-010-0107-z

DO - 10.1007/s11280-010-0107-z

M3 - Journal article

AN - SCOPUS:80052289616

SN - 1386-145X

VL - 14

SP - 495

EP - 544

JO - World Wide Web

JF - World Wide Web

IS - 5

ER -

Comparing data summaries for processing live queries over Linked Data

Abstract

Keywords

Access to Document

AUB Link

Other files and links

Fingerprint

Cite this