IHCS: An Integrated Hybrid Cleaning System

Congcong Ge, Yunjun Gao, Xiaoye Miao, Lu Chen, Christian S. Jensen, Ziyuan Zhu

Publikation: Bidrag til tidsskriftKonferenceartikel i tidsskriftForskningpeer review

Abstrakt

Data cleaning is a prerequisite to subsequent data analysis, and is know to often be time-consuming and labor-intensive. We present IHCS, a hybrid data cleaning system that integrates error detection and repair to contend effectively with multiple error types. In a preprocessing step that precedes the data cleaning, IHCS formats an input dataset to be cleaned, and transforms applicable data quality rules into a unified format. Then, an MLN index structure is formed according to the unified rules, enabling IHCS to handle multiple error types simultaneously. During the cleaning, IHCS first tackles abnormalities through an abnormal group process, and then, it generates multiple data versions based on the MLN index. Finally, IHCS eliminates conflicting values across the multiple versions, and derives the final unified clean data. A visual interface enables cleaning process monitoring and cleaning result analysis.
OriginalsprogEngelsk
TidsskriftProceedings of the VLDB Endowment
Vol/bind12
Udgave nummer12
Sider (fra-til)1874-1877
Antal sider4
ISSN2150-8097
DOI
StatusUdgivet - 2019
Begivenhed45th International Conference on Very Large Data Bases -
Varighed: 26 aug. 201930 aug. 2019
Konferencens nummer: 45
http://vldb.org/2019/

Konference

Konference45th International Conference on Very Large Data Bases
Nummer45
Periode26/08/201930/08/2019
Internetadresse

Fingeraftryk Dyk ned i forskningsemnerne om 'IHCS: An Integrated Hybrid Cleaning System'. Sammen danner de et unikt fingeraftryk.

  • Citationsformater