SimpleETL: ETL Processing by Simple Specifications

Publikation: Bidrag til bog/antologi/rapport/konference proceedingKonferenceartikel i proceedingForskningpeer review

1 Citation (Scopus)
7 Downloads (Pure)

Resumé

Massive quantities of data are today collected from many sources. However, it is often labor-intensive to handle and integrate these data sources into a data warehouse. Further, the complexity is increased when specific requirements exist. One such new requirement, is the right to be forgotten where an organization upon request must delete all data about an individual. Another requirement is when facts are updated retrospectively. In this paper, we present the general framework SimpleETL which is currently used for Extract-Transform-Load (ETL) processing in a company with such requirements. SimpleETL automatically handles all database interactions such as creating fact tables, dimensions, and foreign keys. The framework also has features for handling
version management of facts and implements four different methods for handling deleted facts. The framework enables, e.g., data scientists, to program complete and complex ETL solutions very efficiently with only few lines of code, which is demonstrated with a real-world example.
OriginalsprogEngelsk
TitelProceedings of the 20th International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data co-located with 10th EDBT/ICDT Joint Conference
Antal sider6
Vol/bind2062
ForlagCEUR Workshop Proceedings
Publikationsdato1 jan. 2018
StatusUdgivet - 1 jan. 2018
Begivenhed20th International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data co-located with 10th EDBT/ICDT Joint Conference - TU Wien's Faculty of Electrical Engineering, Wien, Østrig
Varighed: 26 mar. 201829 mar. 2018
Konferencens nummer: 20
http://www.cs.put.poznan.pl/events/DOLAP2018.html

Konference

Konference20th International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data co-located with 10th EDBT/ICDT Joint Conference
Nummer20
LokationTU Wien's Faculty of Electrical Engineering
LandØstrig
ByWien
Periode26/03/201829/03/2018
Internetadresse
NavnCEUR Workshop Proceedings
Vol/bind2062
ISSN1613-0073

Citer dette

Andersen, O., Thomsen, C., & Torp, K. (2018). SimpleETL: ETL Processing by Simple Specifications. I Proceedings of the 20th International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data co-located with 10th EDBT/ICDT Joint Conference (Bind 2062). CEUR Workshop Proceedings. CEUR Workshop Proceedings, Bind. 2062
Andersen, Ove ; Thomsen, Christian ; Torp, Kristian. / SimpleETL : ETL Processing by Simple Specifications. Proceedings of the 20th International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data co-located with 10th EDBT/ICDT Joint Conference. Bind 2062 CEUR Workshop Proceedings, 2018. (CEUR Workshop Proceedings, Bind 2062).
@inproceedings{f4a173227c654314a4392cb567a7ce27,
title = "SimpleETL: ETL Processing by Simple Specifications",
abstract = "Massive quantities of data are today collected from many sources. However, it is often labor-intensive to handle and integrate these data sources into a data warehouse. Further, the complexity is increased when specific requirements exist. One such new requirement, is the right to be forgotten where an organization upon request must delete all data about an individual. Another requirement is when facts are updated retrospectively. In this paper, we present the general framework SimpleETL which is currently used for Extract-Transform-Load (ETL) processing in a company with such requirements. SimpleETL automatically handles all database interactions such as creating fact tables, dimensions, and foreign keys. The framework also has features for handlingversion management of facts and implements four different methods for handling deleted facts. The framework enables, e.g., data scientists, to program complete and complex ETL solutions very efficiently with only few lines of code, which is demonstrated with a real-world example.",
author = "Ove Andersen and Christian Thomsen and Kristian Torp",
year = "2018",
month = "1",
day = "1",
language = "English",
volume = "2062",
booktitle = "Proceedings of the 20th International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data co-located with 10th EDBT/ICDT Joint Conference",
publisher = "CEUR Workshop Proceedings",

}

Andersen, O, Thomsen, C & Torp, K 2018, SimpleETL: ETL Processing by Simple Specifications. i Proceedings of the 20th International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data co-located with 10th EDBT/ICDT Joint Conference. bind 2062, CEUR Workshop Proceedings, CEUR Workshop Proceedings, bind 2062, 20th International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data co-located with 10th EDBT/ICDT Joint Conference, Wien, Østrig, 26/03/2018.

SimpleETL : ETL Processing by Simple Specifications. / Andersen, Ove; Thomsen, Christian; Torp, Kristian.

Proceedings of the 20th International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data co-located with 10th EDBT/ICDT Joint Conference. Bind 2062 CEUR Workshop Proceedings, 2018.

Publikation: Bidrag til bog/antologi/rapport/konference proceedingKonferenceartikel i proceedingForskningpeer review

TY - GEN

T1 - SimpleETL

T2 - ETL Processing by Simple Specifications

AU - Andersen, Ove

AU - Thomsen, Christian

AU - Torp, Kristian

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Massive quantities of data are today collected from many sources. However, it is often labor-intensive to handle and integrate these data sources into a data warehouse. Further, the complexity is increased when specific requirements exist. One such new requirement, is the right to be forgotten where an organization upon request must delete all data about an individual. Another requirement is when facts are updated retrospectively. In this paper, we present the general framework SimpleETL which is currently used for Extract-Transform-Load (ETL) processing in a company with such requirements. SimpleETL automatically handles all database interactions such as creating fact tables, dimensions, and foreign keys. The framework also has features for handlingversion management of facts and implements four different methods for handling deleted facts. The framework enables, e.g., data scientists, to program complete and complex ETL solutions very efficiently with only few lines of code, which is demonstrated with a real-world example.

AB - Massive quantities of data are today collected from many sources. However, it is often labor-intensive to handle and integrate these data sources into a data warehouse. Further, the complexity is increased when specific requirements exist. One such new requirement, is the right to be forgotten where an organization upon request must delete all data about an individual. Another requirement is when facts are updated retrospectively. In this paper, we present the general framework SimpleETL which is currently used for Extract-Transform-Load (ETL) processing in a company with such requirements. SimpleETL automatically handles all database interactions such as creating fact tables, dimensions, and foreign keys. The framework also has features for handlingversion management of facts and implements four different methods for handling deleted facts. The framework enables, e.g., data scientists, to program complete and complex ETL solutions very efficiently with only few lines of code, which is demonstrated with a real-world example.

UR - http://www.scopus.com/inward/record.url?scp=85044296032&partnerID=8YFLogxK

M3 - Article in proceeding

VL - 2062

BT - Proceedings of the 20th International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data co-located with 10th EDBT/ICDT Joint Conference

PB - CEUR Workshop Proceedings

ER -

Andersen O, Thomsen C, Torp K. SimpleETL: ETL Processing by Simple Specifications. I Proceedings of the 20th International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data co-located with 10th EDBT/ICDT Joint Conference. Bind 2062. CEUR Workshop Proceedings. 2018. (CEUR Workshop Proceedings, Bind 2062).