TY - GEN
T1 - On the selection of SPARQL endpoints to efficiently execute federated SPARQL queries
AU - Vidal, Maria Esther
AU - Castillo, Simón
AU - Acosta, Maribel
AU - Montoya, Gabriela
AU - Palma, Guillermo
PY - 2016/1/1
Y1 - 2016/1/1
N2 - We consider the problem of source selection and query decomposition in federations of SPARQL endpoints, where query decompositions of a SPARQL query should reduce execution time and maximize answer completeness. This problem is in general intractable, and performance and answer completeness of SPARQL queries can be considerably affected when the number of SPARQL endpoints in a federation increases. We devise a formalization of this problem as the Vertex Coloring Problem and propose an approximate algorithm named Fed- DSATUR. We rely on existing results from graph theory to characterize the family of SPARQL queries for which Fed-DSATUR can produce optimal decompositions in polynomial time on the size of the query, i.e., on the number of SPARQL triple patterns in the query. Fed-DSATUR scales up much better to SPARQL queries with a large number of triple patterns, and may exhibit significant improvements in performance while answer completeness remains close to 100%. More importantly, we put our results in perspective, and provide evidence of SPARQL queries that are hard to decompose and constitute new challenges for data management.
AB - We consider the problem of source selection and query decomposition in federations of SPARQL endpoints, where query decompositions of a SPARQL query should reduce execution time and maximize answer completeness. This problem is in general intractable, and performance and answer completeness of SPARQL queries can be considerably affected when the number of SPARQL endpoints in a federation increases. We devise a formalization of this problem as the Vertex Coloring Problem and propose an approximate algorithm named Fed- DSATUR. We rely on existing results from graph theory to characterize the family of SPARQL queries for which Fed-DSATUR can produce optimal decompositions in polynomial time on the size of the query, i.e., on the number of SPARQL triple patterns in the query. Fed-DSATUR scales up much better to SPARQL queries with a large number of triple patterns, and may exhibit significant improvements in performance while answer completeness remains close to 100%. More importantly, we put our results in perspective, and provide evidence of SPARQL queries that are hard to decompose and constitute new challenges for data management.
UR - http://www.scopus.com/inward/record.url?scp=84969278990&partnerID=8YFLogxK
U2 - 10.1007/978-3-662-49534-6_4
DO - 10.1007/978-3-662-49534-6_4
M3 - Article in proceeding
AN - SCOPUS:84969278990
SN - 9783662495339
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 109
EP - 149
BT - Transactions on Large-Scale Data- and Knowledge-Centered Systems XXV
A2 - Hameurlain, Abdelkader
A2 - Küng, Josef
A2 - Wagner, Roland
PB - Springer
T2 - International Conference on Transactions on Large-Scale Data- and Knowledge-Centered Systems, TLDKS 2016
Y2 - 1 January 2016
ER -