TY - GEN
T1 - Easy and Effective Parallel Programmable ETL
AU - Thomsen, Christian
AU - Pedersen, Torben Bach
N1 - Conference code: 14
PY - 2011
Y1 - 2011
N2 - Extract–Transform–Load (ETL) programs are used to load datainto data warehouses (DWs). An ETL program must extract datafrom sources, apply different transformations to it, and use the DWto look up/insert the data. It is both time consuming to develop andto run an ETL program. It is, however, typically the case that theETL program can exploit both task parallelism and data parallelismto run faster. This, on the other hand, makes the development timelonger as it is complex to create a parallel ETL program. To remedythis situation, we propose efficient ways to parallelize typical ETLtasks and we implement these new constructs in an ETL framework.The constructs are easy to apply and do only require fewmodifications to an ETL program to parallelize it. They supportboth task and data parallelism and give the programmer differentpossibilities to choose from. An experimental evaluation showsthat by using a little more CPU time, the (wall-clock) time to runan ETL program can be greatly reduced.
AB - Extract–Transform–Load (ETL) programs are used to load datainto data warehouses (DWs). An ETL program must extract datafrom sources, apply different transformations to it, and use the DWto look up/insert the data. It is both time consuming to develop andto run an ETL program. It is, however, typically the case that theETL program can exploit both task parallelism and data parallelismto run faster. This, on the other hand, makes the development timelonger as it is complex to create a parallel ETL program. To remedythis situation, we propose efficient ways to parallelize typical ETLtasks and we implement these new constructs in an ETL framework.The constructs are easy to apply and do only require fewmodifications to an ETL program to parallelize it. They supportboth task and data parallelism and give the programmer differentpossibilities to choose from. An experimental evaluation showsthat by using a little more CPU time, the (wall-clock) time to runan ETL program can be greatly reduced.
U2 - 10.1145/2064676.2064684
DO - 10.1145/2064676.2064684
M3 - Article in proceeding
SP - 37
EP - 44
BT - Proceedings of the ACM 14th International Workshop on Data warehousing and OLAP
PB - Association for Computing Machinery (ACM)
CY - New York, NY, USA
T2 - ACM 14th International Workshop on Data Warehousing and OLAP
Y2 - 28 October 2011
ER -