MapReduce-based Dimensional ETL Made Easy

Publikation: Bidrag til tidsskriftKonferenceartikel i tidsskrift

17 Citationer (Scopus)

Abstrakt

This paper demonstrates ETLMR, a novel dimensional Extract–Transform–Load (ETL) programming framework that uses MapReduce to achieve scalability. ETLMR has builtin native support of data warehouse (DW) specific constructs such as star schemas, snowflake schemas, and slowly changing dimensions (SCDs). This makes it possible to build MapReducebased dimensional ETL flows very easily. The ETL process can be configured with only few lines of code. We will demonstrate the concrete steps in using ETLMR to load data into a (partly snowflaked) DW schema. This includes configuration of data sources and targets, dimension processing schemes, fact processing, and employment. In addition, we also present the scalability on large data sets.
OriginalsprogEngelsk
TidsskriftProceedings of the VLDB Endowment
Vol/bind5
Udgave nummer12
Sider (fra-til)1882-1885
Antal sider4
ISSN2150-8097
StatusUdgivet - aug. 2012
BegivenhedInternational Conference on Very Large Data Bases - Istanbul, Tyrkiet
Varighed: 27 aug. 201231 aug. 2012
Konferencens nummer: 38

Konference

KonferenceInternational Conference on Very Large Data Bases
Nummer38
LandTyrkiet
ByIstanbul
Periode27/08/201231/08/2012

    Fingerprint

Citationsformater