Easy and Effective Parallel Programmable ETL

Publikation: Bidrag til bog/antologi/rapport/konference proceedingKonferenceartikel i proceedingForskningpeer review

15 Citationer (Scopus)

Abstract

Extract–Transform–Load (ETL) programs are used to load data
into data warehouses (DWs). An ETL program must extract data
from sources, apply different transformations to it, and use the DW
to look up/insert the data. It is both time consuming to develop and
to run an ETL program. It is, however, typically the case that the
ETL program can exploit both task parallelism and data parallelism
to run faster. This, on the other hand, makes the development time
longer as it is complex to create a parallel ETL program. To remedy
this situation, we propose efficient ways to parallelize typical ETL
tasks and we implement these new constructs in an ETL framework.
The constructs are easy to apply and do only require few
modifications to an ETL program to parallelize it. They support
both task and data parallelism and give the programmer different
possibilities to choose from. An experimental evaluation shows
that by using a little more CPU time, the (wall-clock) time to run
an ETL program can be greatly reduced.
OriginalsprogEngelsk
TitelProceedings of the ACM 14th International Workshop on Data warehousing and OLAP
Antal sider8
UdgivelsesstedNew York, NY, USA
ForlagAssociation for Computing Machinery
Publikationsdato2011
Sider37-44
ISBN (Elektronisk)978-1-4503-0963-9
DOI
StatusUdgivet - 2011
BegivenhedACM 14th International Workshop on Data Warehousing and OLAP - Glasgow, Storbritannien
Varighed: 28 okt. 2011 → …
Konferencens nummer: 14

Konference

KonferenceACM 14th International Workshop on Data Warehousing and OLAP
Nummer14
Land/OmrådeStorbritannien
ByGlasgow
Periode28/10/2011 → …

Fingeraftryk

Dyk ned i forskningsemnerne om 'Easy and Effective Parallel Programmable ETL'. Sammen danner de et unikt fingeraftryk.

Citationsformater