Easy and Effective Parallel Programmable ETL

Research output: Contribution to book/anthology/report/conference proceedingArticle in proceedingResearchpeer-review

15 Citations (Scopus)

Abstract

Extract–Transform–Load (ETL) programs are used to load data
into data warehouses (DWs). An ETL program must extract data
from sources, apply different transformations to it, and use the DW
to look up/insert the data. It is both time consuming to develop and
to run an ETL program. It is, however, typically the case that the
ETL program can exploit both task parallelism and data parallelism
to run faster. This, on the other hand, makes the development time
longer as it is complex to create a parallel ETL program. To remedy
this situation, we propose efficient ways to parallelize typical ETL
tasks and we implement these new constructs in an ETL framework.
The constructs are easy to apply and do only require few
modifications to an ETL program to parallelize it. They support
both task and data parallelism and give the programmer different
possibilities to choose from. An experimental evaluation shows
that by using a little more CPU time, the (wall-clock) time to run
an ETL program can be greatly reduced.
Original languageEnglish
Title of host publicationProceedings of the ACM 14th International Workshop on Data warehousing and OLAP
Number of pages8
Place of PublicationNew York, NY, USA
PublisherAssociation for Computing Machinery (ACM)
Publication date2011
Pages37-44
ISBN (Electronic)978-1-4503-0963-9
DOIs
Publication statusPublished - 2011
EventACM 14th International Workshop on Data Warehousing and OLAP - Glasgow, United Kingdom
Duration: 28 Oct 2011 → …
Conference number: 14

Conference

ConferenceACM 14th International Workshop on Data Warehousing and OLAP
Number14
Country/TerritoryUnited Kingdom
CityGlasgow
Period28/10/2011 → …

Fingerprint

Dive into the research topics of 'Easy and Effective Parallel Programmable ETL'. Together they form a unique fingerprint.

Cite this