Easy and Effective Parallel Programmable ETL
Publikation: Forskning - peer review › Konferenceartikel i proceeding
Extract–Transform–Load (ETL) programs are used to load data
into data warehouses (DWs). An ETL program must extract data
from sources, apply different transformations to it, and use the DW
to look up/insert the data. It is both time consuming to develop and
to run an ETL program. It is, however, typically the case that the
ETL program can exploit both task parallelism and data parallelism
to run faster. This, on the other hand, makes the development time
longer as it is complex to create a parallel ETL program. To remedy
this situation, we propose efficient ways to parallelize typical ETL
tasks and we implement these new constructs in an ETL framework.
The constructs are easy to apply and do only require few
modifications to an ETL program to parallelize it. They support
both task and data parallelism and give the programmer different
possibilities to choose from. An experimental evaluation shows
that by using a little more CPU time, the (wall-clock) time to run
an ETL program can be greatly reduced.
into data warehouses (DWs). An ETL program must extract data
from sources, apply different transformations to it, and use the DW
to look up/insert the data. It is both time consuming to develop and
to run an ETL program. It is, however, typically the case that the
ETL program can exploit both task parallelism and data parallelism
to run faster. This, on the other hand, makes the development time
longer as it is complex to create a parallel ETL program. To remedy
this situation, we propose efficient ways to parallelize typical ETL
tasks and we implement these new constructs in an ETL framework.
The constructs are easy to apply and do only require few
modifications to an ETL program to parallelize it. They support
both task and data parallelism and give the programmer different
possibilities to choose from. An experimental evaluation shows
that by using a little more CPU time, the (wall-clock) time to run
an ETL program can be greatly reduced.
| Originalsprog | Engelsk |
|---|---|
| Titel | Proceedings of the ACM 14th International Workshop on Data warehousing and OLAP |
| Antal sider | 8 |
| Udgivelsessted | New York, NY, USA |
| Udgiver | Association for Computing Machinery |
| Udgivelsesdato | 2011 |
| Sider | 37-44 |
| ISBN (elektronisk) | 978-1-4503-0963-9 |
| DOI | |
| Status | Udgivet |
Konference
| Konference | ACM 14th International Workshop on Data Warehousing and OLAP |
|---|---|
| Nummer | 14 |
| Land | Storbritannien |
| By | Glasgow |
| Periode | 28-10-11 → … |
Indlæser lokationer...
ID: 56614155