Estimating the extent of the effects of data quality through observations

Daniele Foroni, Matteo Lissandrini, Yannis Velegrakis

Publikation: Bidrag til bog/antologi/rapport/konference proceedingKonferenceartikel i proceedingForskningpeer review

5 Citationer (Scopus)

Abstract

Existing data quality works have so far focused on the computation of many data characteristics as a mean of quantifying different quality dimensions, like freshness, consistency, accuracy, or completeness, that are all defined about some ideal (clean) dataset. We claim that this approach falls short in providing a full specification of the quality of the data since it does not take into consideration the task for which the data is to be used, neither any future instances of the dataset. We argue that apart from the difference from the clean dataset, it is equally important to know the degree to which such difference affects the results of the task at hand. Thus, we extend the existing data quality definition to include that degree. Our approach, not only allows data quality to be considered in the context of the intended task, but can also provide useful information even in the absence of the clean dataset, and proffer an understanding of the effect of data quality in future dataset instances. We describe a system and its implementation that computes this extended form of data quality through a principled approach of systematic noise generation and task result evaluation. We perform numerous experiments illustrating the effectiveness of the approach and how this allows contextualizing traditional data quality measures.

OriginalsprogEngelsk
TitelProceedings - 2021 IEEE 37th International Conference on Data Engineering, ICDE 2021
Antal sider6
ForlagIEEE
Publikationsdatoapr. 2021
Sider1913-1918
Artikelnummer9458690
ISBN (Trykt)978-1-7281-9185-0
ISBN (Elektronisk)978-1-7281-9184-3
DOI
StatusUdgivet - apr. 2021
Begivenhed37th IEEE International Conference on Data Engineering, ICDE 2021 - Virtual, Chania, Grækenland
Varighed: 19 apr. 202122 apr. 2021

Konference

Konference37th IEEE International Conference on Data Engineering, ICDE 2021
Land/OmrådeGrækenland
ByVirtual, Chania
Periode19/04/202122/04/2021
NavnProceedings - International Conference on Data Engineering
Vol/bind2021-April
ISSN1084-4627

Bibliografisk note

Publisher Copyright:
© 2021 IEEE.

Fingeraftryk

Dyk ned i forskningsemnerne om 'Estimating the extent of the effects of data quality through observations'. Sammen danner de et unikt fingeraftryk.

Citationsformater