Abstract
The top-k dominating query returns k data objects
which dominate the highest number of objects in a
dataset. This query is an important tool for decision support
since it provides data analysts an intuitive way for finding
significant objects. In addition, it combines the advantages
of top-k and skyline queries without sharing their disadvantages:
(i) the output size can be controlled, (ii) no ranking
functions need to be specified by users, and (iii) the result
is independent of the scales at different dimensions. Despite
their importance, top-k dominating queries have not
received adequate attention from the research community.
This paper is an extensive study on the evaluation of topk
dominating queries. First, we propose a set of algorithms
that apply on indexed multi-dimensional data. Second, we
investigate query evaluation on data that are not indexed. Finally,
we study a relaxed variant of the query which considers
dominance in dimensional subspaces. Experiments using
synthetic and real datasets demonstrate that our algorithms
significantly outperform a previous skyline-based approach.
We also illustrate the applicability of this multi-dimensional
analysis query by studying the meaningfulness of its results
on real data.
which dominate the highest number of objects in a
dataset. This query is an important tool for decision support
since it provides data analysts an intuitive way for finding
significant objects. In addition, it combines the advantages
of top-k and skyline queries without sharing their disadvantages:
(i) the output size can be controlled, (ii) no ranking
functions need to be specified by users, and (iii) the result
is independent of the scales at different dimensions. Despite
their importance, top-k dominating queries have not
received adequate attention from the research community.
This paper is an extensive study on the evaluation of topk
dominating queries. First, we propose a set of algorithms
that apply on indexed multi-dimensional data. Second, we
investigate query evaluation on data that are not indexed. Finally,
we study a relaxed variant of the query which considers
dominance in dimensional subspaces. Experiments using
synthetic and real datasets demonstrate that our algorithms
significantly outperform a previous skyline-based approach.
We also illustrate the applicability of this multi-dimensional
analysis query by studying the meaningfulness of its results
on real data.
Original language | English |
---|---|
Journal | VLDB Journal |
Volume | 18 |
Issue number | 3 |
Pages (from-to) | 695-718 |
ISSN | 1066-8888 |
Publication status | Published - 2009 |