Lightweight graphical models for selectivity estimation without independence assumptions

Kostas Tzoumas; Amol Deshpande; Christian S. Jensen

Lightweight graphical models for selectivity estimation without independence assumptions

Kostas Tzoumas^*, Amol Deshpande, Christian S. Jensen

^*Kontaktforfatter

Publikation: Bidrag til tidsskrift › Konferenceartikel i tidsskrift › Forskning › peer review

49 Citationer (Scopus)

Abstract

As a result of decades of research and industrial development, modern query optimizers are complex software artifacts. However, the quality of the query plan chosen by an optimizer is largely determined by the quality of the underlying statistical summaries. Small selectivity estimation errors, propagated exponentially, can lead to severely sub-optimal plans. Modern optimizers typically maintain one-dimensional statistical summaries and make the attribute value independence and join uniformity assumptions for efficiently estimating selectivities. Therefore, selectivity estimation errors in today's optimizers are frequently caused by missed correlations between attributes. We present a selectivity estimation approach that does not make the independence assumptions. By carefully using concepts from the field of graphical models, we are able to factor the joint probability distribution of all the attributes in the database into small, usually two-dimensional distributions. We describe several optimizations that can make selectivity estimation highly efficient, and we present a complete implementation inside PostgreSQL's query optimizer. Experimental results indicate an order of magnitude better selectivity estimates, while keeping optimization time in the range of tens of milliseconds.

Originalsprog	Engelsk
Tidsskrift	Proceedings of the VLDB Endowment
Vol/bind	4
Sider (fra-til)	852-863
Antal sider	12
ISSN	2150-8097
Status	Udgivet - aug. 2011
Begivenhed	The 37th International Conference on Very Large Data Bases - Seattle, Washington, USA Varighed: 29 aug. 2011 → 3 sep. 2011

Konference

Konference	The 37th International Conference on Very Large Data Bases
Land/Område	USA
By	Seattle, Washington
Periode	29/08/2011 → 03/09/2011

Adgang til dokumentet

http://www.vldb.org/pvldb/vol4/p852-tzoumas.pdf

AUB Link

Søg efter materialet i Aalborg Universitetsbiblioteks søgemaskine

Andre filer og links

http://www.scopus.com/inward/record.url?scp=84862607323&partnerID=8YFLogxK

Citationsformater

@inproceedings{871bbe77fba942e8b8a2c70f13f2d9f2,

title = "Lightweight graphical models for selectivity estimation without independence assumptions",

abstract = "As a result of decades of research and industrial development, modern query optimizers are complex software artifacts. However, the quality of the query plan chosen by an optimizer is largely determined by the quality of the underlying statistical summaries. Small selectivity estimation errors, propagated exponentially, can lead to severely sub-optimal plans. Modern optimizers typically maintain one-dimensional statistical summaries and make the attribute value independence and join uniformity assumptions for efficiently estimating selectivities. Therefore, selectivity estimation errors in today's optimizers are frequently caused by missed correlations between attributes. We present a selectivity estimation approach that does not make the independence assumptions. By carefully using concepts from the field of graphical models, we are able to factor the joint probability distribution of all the attributes in the database into small, usually two-dimensional distributions. We describe several optimizations that can make selectivity estimation highly efficient, and we present a complete implementation inside PostgreSQL's query optimizer. Experimental results indicate an order of magnitude better selectivity estimates, while keeping optimization time in the range of tens of milliseconds.",

author = "Kostas Tzoumas and Amol Deshpande and Jensen, {Christian S.}",

year = "2011",

month = aug,

language = "English",

volume = "4",

pages = "852--863",

journal = "Proceedings of the VLDB Endowment",

issn = "2150-8097",

publisher = "VLDB Endowment",

note = "The 37th International Conference on Very Large Data Bases ; Conference date: 29-08-2011 Through 03-09-2011",

}

TY - GEN

T1 - Lightweight graphical models for selectivity estimation without independence assumptions

AU - Tzoumas, Kostas

AU - Deshpande, Amol

AU - Jensen, Christian S.

PY - 2011/8

Y1 - 2011/8

N2 - As a result of decades of research and industrial development, modern query optimizers are complex software artifacts. However, the quality of the query plan chosen by an optimizer is largely determined by the quality of the underlying statistical summaries. Small selectivity estimation errors, propagated exponentially, can lead to severely sub-optimal plans. Modern optimizers typically maintain one-dimensional statistical summaries and make the attribute value independence and join uniformity assumptions for efficiently estimating selectivities. Therefore, selectivity estimation errors in today's optimizers are frequently caused by missed correlations between attributes. We present a selectivity estimation approach that does not make the independence assumptions. By carefully using concepts from the field of graphical models, we are able to factor the joint probability distribution of all the attributes in the database into small, usually two-dimensional distributions. We describe several optimizations that can make selectivity estimation highly efficient, and we present a complete implementation inside PostgreSQL's query optimizer. Experimental results indicate an order of magnitude better selectivity estimates, while keeping optimization time in the range of tens of milliseconds.

AB - As a result of decades of research and industrial development, modern query optimizers are complex software artifacts. However, the quality of the query plan chosen by an optimizer is largely determined by the quality of the underlying statistical summaries. Small selectivity estimation errors, propagated exponentially, can lead to severely sub-optimal plans. Modern optimizers typically maintain one-dimensional statistical summaries and make the attribute value independence and join uniformity assumptions for efficiently estimating selectivities. Therefore, selectivity estimation errors in today's optimizers are frequently caused by missed correlations between attributes. We present a selectivity estimation approach that does not make the independence assumptions. By carefully using concepts from the field of graphical models, we are able to factor the joint probability distribution of all the attributes in the database into small, usually two-dimensional distributions. We describe several optimizations that can make selectivity estimation highly efficient, and we present a complete implementation inside PostgreSQL's query optimizer. Experimental results indicate an order of magnitude better selectivity estimates, while keeping optimization time in the range of tens of milliseconds.

UR - http://www.scopus.com/inward/record.url?scp=84862607323&partnerID=8YFLogxK

M3 - Conference article in Journal

AN - SCOPUS:84862607323

SN - 2150-8097

VL - 4

SP - 852

EP - 863

JO - Proceedings of the VLDB Endowment

JF - Proceedings of the VLDB Endowment

T2 - The 37th International Conference on Very Large Data Bases

Y2 - 29 August 2011 through 3 September 2011

ER -

Lightweight graphical models for selectivity estimation without independence assumptions

Abstract

Konference

Adgang til dokumentet

AUB Link

Andre filer og links

Fingeraftryk

Citationsformater