Abstract
What-if analysis is a data-intensive exploration to inspect how changes in a set of input parameters of a model influence some outcomes. It is motivated by a user trying to understand the sensitivity of a model to a certain parameter in order to reach a set of goals that are defined over the outcomes. To avoid an exploration of all possible combinations of parameter values, efficient what-if analysis calls for a partitioning of parameter values into data ranges and a unified representation of the obtained outcomes per range. Traditional techniques to capture data ranges, such as histograms, are limited to one outcome dimension. Yet, in practice, what-if analysis often involves conflicting goals that are defined over different dimensions of the outcome. Working on each of those goals independently cannot capture the inherent trade-off between them. In this paper, we propose techniques to recommend data ranges for what-if analysis, which capture not only data regularities, but also the trade-off between conflicting goals. Specifically, we formulate a parametric data partitioning problem and propose a method to find an optimal solution for it. Targeting scalability to large datasets, we further provide a heuristic solution to this problem. By theoretical and empirical analyses, we establish performance guarantees in terms of runtime and result quality.
Original language | English |
---|---|
Title of host publication | Proceedings - IEEE 34th International Conference on Data Engineering, ICDE 2018 |
Number of pages | 12 |
Publisher | IEEE |
Publication date | 24 Oct 2018 |
Pages | 89-100 |
Article number | 8509239 |
ISBN (Print) | 978-1-5386-5521-4 |
ISBN (Electronic) | 978-1-5386-5520-7 |
DOIs | |
Publication status | Published - 24 Oct 2018 |
Event | 34th IEEE International Conference on Data Engineering, ICDE 2018 - Paris, France Duration: 16 Apr 2018 → 19 Apr 2018 |
Conference
Conference | 34th IEEE International Conference on Data Engineering, ICDE 2018 |
---|---|
Country/Territory | France |
City | Paris |
Period | 16/04/2018 → 19/04/2018 |
Keywords
- Data partitioning
- Pareto analysis
- what if analysis