Sharing-Aware Horizontal Partitioning for Exploiting Correlations during Query Processing

Kostas Tzoumas, Amol Deshpande, Christian Søndergaard Jensen

Research output: Contribution to journalJournal articleResearchpeer-review

6 Citations (Scopus)

Abstract

Optimization of join queries based on average selectivities is suboptimal in highly correlated databases. In such databases, relations are naturally divided into partitions, each partition having substantially different statistical characteristics. It is very compelling to discover such data partitions during query optimization and create multiple plans for a given query, one plan being optimal for a particular combination of data partitions. This scenario calls for the sharing of state among plans, so that common intermediate results are not recomputed. We study this problem in a setting with a routing-based query execution engine based on eddies [1]. Eddies naturally encapsulate horizontal partitioning and maximal state sharing across multiple plans. We define the notion of a conditional join plan, a novel representation of the search space that enables us to address the problem in a principled way. We present a lowoverhead greedy algorithm that uses statistical summaries based on graphical models. Experimental results suggest an order of magnitude faster execution time over traditional optimization for high correlations, while maintaining the same performance for low correlations.

[1] R. Avnur and J. M. Hellerstein. Eddies: Continuously adaptive query processing. In SIGMOD, pp. 261272, 2000.
Original languageEnglish
JournalProceedings of the VLDB Endowment
Volume3
Issue number1-2
Pages (from-to)542-553
ISSN2150-8097
Publication statusPublished - Sep 2010

Fingerprint

Query processing
Engines

Cite this

@article{58fc04eaa29f4081bf9672489592ad49,
title = "Sharing-Aware Horizontal Partitioning for Exploiting Correlations during Query Processing",
abstract = "Optimization of join queries based on average selectivities is suboptimal in highly correlated databases. In such databases, relations are naturally divided into partitions, each partition having substantially different statistical characteristics. It is very compelling to discover such data partitions during query optimization and create multiple plans for a given query, one plan being optimal for a particular combination of data partitions. This scenario calls for the sharing of state among plans, so that common intermediate results are not recomputed. We study this problem in a setting with a routing-based query execution engine based on eddies [1]. Eddies naturally encapsulate horizontal partitioning and maximal state sharing across multiple plans. We define the notion of a conditional join plan, a novel representation of the search space that enables us to address the problem in a principled way. We present a lowoverhead greedy algorithm that uses statistical summaries based on graphical models. Experimental results suggest an order of magnitude faster execution time over traditional optimization for high correlations, while maintaining the same performance for low correlations. [1] R. Avnur and J. M. Hellerstein. Eddies: Continuously adaptive query processing. In SIGMOD, pp. 261272, 2000.",
author = "Kostas Tzoumas and Amol Deshpande and Jensen, {Christian S{\o}ndergaard}",
year = "2010",
month = "9",
language = "English",
volume = "3",
pages = "542--553",
journal = "Proceedings of the VLDB Endowment",
issn = "2150-8097",
publisher = "VLDB Endowment",
number = "1-2",

}

Sharing-Aware Horizontal Partitioning for Exploiting Correlations during Query Processing. / Tzoumas, Kostas; Deshpande, Amol; Jensen, Christian Søndergaard.

In: Proceedings of the VLDB Endowment, Vol. 3, No. 1-2, 09.2010, p. 542-553.

Research output: Contribution to journalJournal articleResearchpeer-review

TY - JOUR

T1 - Sharing-Aware Horizontal Partitioning for Exploiting Correlations during Query Processing

AU - Tzoumas, Kostas

AU - Deshpande, Amol

AU - Jensen, Christian Søndergaard

PY - 2010/9

Y1 - 2010/9

N2 - Optimization of join queries based on average selectivities is suboptimal in highly correlated databases. In such databases, relations are naturally divided into partitions, each partition having substantially different statistical characteristics. It is very compelling to discover such data partitions during query optimization and create multiple plans for a given query, one plan being optimal for a particular combination of data partitions. This scenario calls for the sharing of state among plans, so that common intermediate results are not recomputed. We study this problem in a setting with a routing-based query execution engine based on eddies [1]. Eddies naturally encapsulate horizontal partitioning and maximal state sharing across multiple plans. We define the notion of a conditional join plan, a novel representation of the search space that enables us to address the problem in a principled way. We present a lowoverhead greedy algorithm that uses statistical summaries based on graphical models. Experimental results suggest an order of magnitude faster execution time over traditional optimization for high correlations, while maintaining the same performance for low correlations. [1] R. Avnur and J. M. Hellerstein. Eddies: Continuously adaptive query processing. In SIGMOD, pp. 261272, 2000.

AB - Optimization of join queries based on average selectivities is suboptimal in highly correlated databases. In such databases, relations are naturally divided into partitions, each partition having substantially different statistical characteristics. It is very compelling to discover such data partitions during query optimization and create multiple plans for a given query, one plan being optimal for a particular combination of data partitions. This scenario calls for the sharing of state among plans, so that common intermediate results are not recomputed. We study this problem in a setting with a routing-based query execution engine based on eddies [1]. Eddies naturally encapsulate horizontal partitioning and maximal state sharing across multiple plans. We define the notion of a conditional join plan, a novel representation of the search space that enables us to address the problem in a principled way. We present a lowoverhead greedy algorithm that uses statistical summaries based on graphical models. Experimental results suggest an order of magnitude faster execution time over traditional optimization for high correlations, while maintaining the same performance for low correlations. [1] R. Avnur and J. M. Hellerstein. Eddies: Continuously adaptive query processing. In SIGMOD, pp. 261272, 2000.

M3 - Journal article

VL - 3

SP - 542

EP - 553

JO - Proceedings of the VLDB Endowment

JF - Proceedings of the VLDB Endowment

SN - 2150-8097

IS - 1-2

ER -