The Odyssey Approach for Optimizing Federated SPARQL Queries

Gabriela Montoya, Hala Skaf-Molli, Katja Hose

Research output: Contribution to book/anthology/report/conference proceedingArticle in proceedingResearchpeer-review

10 Citations (Scopus)

Abstract

Answering queries over a federation of SPARQL endpoints requires combining data from more than one data source. Optimizing queries in such scenarios is particularly challenging not only because of (i) the large variety of possible query execution plans that correctly answer the query but also because (ii) there is only limited access to statistics about schema and instance data of remote sources. To overcome these challenges, most federated query engines rely on heuristics to reduce the space of possible query execution plans or on dynamic programming strategies to produce optimal plans. Nevertheless, these plans may still exhibit a high number of intermediate results or high execution times because of heuristics and inaccurate cost estimations. In this paper, we present Odyssey, an approach that uses statistics that allow for a more accurate cost estimation for federated queries and therefore enables Odyssey to produce better query execution plans. Our experimental results show that Odyssey produces query execution plans that are better in terms of data transfer and execution time than state-of-the-art optimizers. Our experiments using the FedBench benchmark show execution time gains of at least 25 times on average.
Original languageEnglish
Title of host publicationThe Semantic Web - ISWC 2017 : 16th International Semantic Web Conference, Vienna, Austria, October 21-25, 2017, Proceedings, Part I
Volume10587
PublisherSpringer
Publication date2017
Pages471-489
ISBN (Print)978-3-319-68287-7
ISBN (Electronic)978-3-319-68288-4
Publication statusPublished - 2017
EventThe 16th International Semantic Web Conference - Vienna, Austria
Duration: 21 Oct 201731 Oct 2017
Conference number: 16th
https://iswc2017.semanticweb.org/

Conference

ConferenceThe 16th International Semantic Web Conference
Number16th
CountryAustria
CityVienna
Period21/10/201731/10/2017
Internet address
SeriesLecture Notes in Computer Science
ISSN0302-9743

Fingerprint

Statistics
Data transfer
Dynamic programming
Costs
Engines
Experiments

Keywords

  • Federated Queries
  • Query Optimization
  • Join Ordering
  • Source Selection

Cite this

Montoya, G., Skaf-Molli, H., & Hose, K. (2017). The Odyssey Approach for Optimizing Federated SPARQL Queries. In The Semantic Web - ISWC 2017: 16th International Semantic Web Conference, Vienna, Austria, October 21-25, 2017, Proceedings, Part I (Vol. 10587, pp. 471-489). Springer. Lecture Notes in Computer Science
Montoya, Gabriela ; Skaf-Molli, Hala ; Hose, Katja. / The Odyssey Approach for Optimizing Federated SPARQL Queries. The Semantic Web - ISWC 2017: 16th International Semantic Web Conference, Vienna, Austria, October 21-25, 2017, Proceedings, Part I. Vol. 10587 Springer, 2017. pp. 471-489 (Lecture Notes in Computer Science).
@inproceedings{964fa89cf57149d89970305304652c17,
title = "The Odyssey Approach for Optimizing Federated SPARQL Queries",
abstract = "Answering queries over a federation of SPARQL endpoints requires combining data from more than one data source. Optimizing queries in such scenarios is particularly challenging not only because of (i) the large variety of possible query execution plans that correctly answer the query but also because (ii) there is only limited access to statistics about schema and instance data of remote sources. To overcome these challenges, most federated query engines rely on heuristics to reduce the space of possible query execution plans or on dynamic programming strategies to produce optimal plans. Nevertheless, these plans may still exhibit a high number of intermediate results or high execution times because of heuristics and inaccurate cost estimations. In this paper, we present Odyssey, an approach that uses statistics that allow for a more accurate cost estimation for federated queries and therefore enables Odyssey to produce better query execution plans. Our experimental results show that Odyssey produces query execution plans that are better in terms of data transfer and execution time than state-of-the-art optimizers. Our experiments using the FedBench benchmark show execution time gains of at least 25 times on average.",
keywords = "Federated Queries, Query Optimization, Join Ordering, Source Selection",
author = "Gabriela Montoya and Hala Skaf-Molli and Katja Hose",
year = "2017",
language = "English",
isbn = "978-3-319-68287-7",
volume = "10587",
series = "Lecture Notes in Computer Science",
publisher = "Springer",
pages = "471--489",
booktitle = "The Semantic Web - ISWC 2017",
address = "Germany",

}

Montoya, G, Skaf-Molli, H & Hose, K 2017, The Odyssey Approach for Optimizing Federated SPARQL Queries. in The Semantic Web - ISWC 2017: 16th International Semantic Web Conference, Vienna, Austria, October 21-25, 2017, Proceedings, Part I. vol. 10587, Springer, Lecture Notes in Computer Science, pp. 471-489, The 16th International Semantic Web Conference, Vienna, Austria, 21/10/2017.

The Odyssey Approach for Optimizing Federated SPARQL Queries. / Montoya, Gabriela; Skaf-Molli, Hala; Hose, Katja.

The Semantic Web - ISWC 2017: 16th International Semantic Web Conference, Vienna, Austria, October 21-25, 2017, Proceedings, Part I. Vol. 10587 Springer, 2017. p. 471-489 (Lecture Notes in Computer Science).

Research output: Contribution to book/anthology/report/conference proceedingArticle in proceedingResearchpeer-review

TY - GEN

T1 - The Odyssey Approach for Optimizing Federated SPARQL Queries

AU - Montoya, Gabriela

AU - Skaf-Molli, Hala

AU - Hose, Katja

PY - 2017

Y1 - 2017

N2 - Answering queries over a federation of SPARQL endpoints requires combining data from more than one data source. Optimizing queries in such scenarios is particularly challenging not only because of (i) the large variety of possible query execution plans that correctly answer the query but also because (ii) there is only limited access to statistics about schema and instance data of remote sources. To overcome these challenges, most federated query engines rely on heuristics to reduce the space of possible query execution plans or on dynamic programming strategies to produce optimal plans. Nevertheless, these plans may still exhibit a high number of intermediate results or high execution times because of heuristics and inaccurate cost estimations. In this paper, we present Odyssey, an approach that uses statistics that allow for a more accurate cost estimation for federated queries and therefore enables Odyssey to produce better query execution plans. Our experimental results show that Odyssey produces query execution plans that are better in terms of data transfer and execution time than state-of-the-art optimizers. Our experiments using the FedBench benchmark show execution time gains of at least 25 times on average.

AB - Answering queries over a federation of SPARQL endpoints requires combining data from more than one data source. Optimizing queries in such scenarios is particularly challenging not only because of (i) the large variety of possible query execution plans that correctly answer the query but also because (ii) there is only limited access to statistics about schema and instance data of remote sources. To overcome these challenges, most federated query engines rely on heuristics to reduce the space of possible query execution plans or on dynamic programming strategies to produce optimal plans. Nevertheless, these plans may still exhibit a high number of intermediate results or high execution times because of heuristics and inaccurate cost estimations. In this paper, we present Odyssey, an approach that uses statistics that allow for a more accurate cost estimation for federated queries and therefore enables Odyssey to produce better query execution plans. Our experimental results show that Odyssey produces query execution plans that are better in terms of data transfer and execution time than state-of-the-art optimizers. Our experiments using the FedBench benchmark show execution time gains of at least 25 times on average.

KW - Federated Queries

KW - Query Optimization

KW - Join Ordering

KW - Source Selection

M3 - Article in proceeding

SN - 978-3-319-68287-7

VL - 10587

T3 - Lecture Notes in Computer Science

SP - 471

EP - 489

BT - The Semantic Web - ISWC 2017

PB - Springer

ER -

Montoya G, Skaf-Molli H, Hose K. The Odyssey Approach for Optimizing Federated SPARQL Queries. In The Semantic Web - ISWC 2017: 16th International Semantic Web Conference, Vienna, Austria, October 21-25, 2017, Proceedings, Part I. Vol. 10587. Springer. 2017. p. 471-489. (Lecture Notes in Computer Science).