MapReduce-based Dimensional ETL Made Easy

Liu Xiufeng; Christian Thomsen; Torben Bach Pedersen

MapReduce-based Dimensional ETL Made Easy

Liu Xiufeng, Christian Thomsen, Torben Bach Pedersen

Publikation: Bidrag til tidsskrift › Konferenceartikel i tidsskrift › Forskning › peer review

Abstract

This paper demonstrates ETLMR, a novel dimensional Extract–Transform–Load (ETL) programming framework that uses MapReduce to achieve scalability. ETLMR has builtin native support of data warehouse (DW) speciﬁc constructs such as star schemas, snowﬂake schemas, and slowly changing dimensions (SCDs). This makes it possible to build MapReducebased dimensional ETL ﬂows very easily. The ETL process can be conﬁgured with only few lines of code. We will demonstrate the concrete steps in using ETLMR to load data into a (partly snowﬂaked) DW schema. This includes conﬁguration of data sources and targets, dimension processing schemes, fact processing, and employment. In addition, we also present the scalability on large data sets.

Originalsprog	Engelsk
Tidsskrift	Proceedings of the VLDB Endowment
Vol/bind	5
Udgave nummer	12
Sider (fra-til)	1882-1885
Antal sider	4
ISSN	2150-8097
Status	Udgivet - aug. 2012
Begivenhed	International Conference on Very Large Data Bases - Istanbul, Tyrkiet Varighed: 27 aug. 2012 → 31 aug. 2012 Konferencens nummer: 38

Konference

Konference	International Conference on Very Large Data Bases
Nummer	38
Land/Område	Tyrkiet
By	Istanbul
Periode	27/08/2012 → 31/08/2012

Adgang til dokumentet

http://vldb.org/pvldb/vol5/p1882_xiufengliu_vldb2012.pdf

AUB Link

Søg efter materialet i Aalborg Universitetsbiblioteks søgemaskine

Andre filer og links

Link to publication in Scopus

Citationsformater

@inproceedings{ac24fc9fe061417b9adc382b56de56e1,

title = "MapReduce-based Dimensional ETL Made Easy",

abstract = "This paper demonstrates ETLMR, a novel dimensional Extract–Transform–Load (ETL) programming framework that uses MapReduce to achieve scalability. ETLMR has builtin native support of data warehouse (DW) speciﬁc constructs such as star schemas, snowﬂake schemas, and slowly changing dimensions (SCDs). This makes it possible to build MapReducebased dimensional ETL ﬂows very easily. The ETL process can be conﬁgured with only few lines of code. We will demonstrate the concrete steps in using ETLMR to load data into a (partly snowﬂaked) DW schema. This includes conﬁguration of data sources and targets, dimension processing schemes, fact processing, and employment. In addition, we also present the scalability on large data sets.",

author = "Liu Xiufeng and Christian Thomsen and Pedersen, {Torben Bach}",

year = "2012",

month = aug,

language = "English",

volume = "5",

pages = "1882--1885",

journal = "Proceedings of the VLDB Endowment",

issn = "2150-8097",

publisher = "VLDB Endowment",

number = "12",

note = "International Conference on Very Large Data Bases ; Conference date: 27-08-2012 Through 31-08-2012",

}

TY - GEN

T1 - MapReduce-based Dimensional ETL Made Easy

AU - Xiufeng, Liu

AU - Thomsen, Christian

AU - Pedersen, Torben Bach

N1 - Conference code: 38

PY - 2012/8

Y1 - 2012/8

N2 - This paper demonstrates ETLMR, a novel dimensional Extract–Transform–Load (ETL) programming framework that uses MapReduce to achieve scalability. ETLMR has builtin native support of data warehouse (DW) speciﬁc constructs such as star schemas, snowﬂake schemas, and slowly changing dimensions (SCDs). This makes it possible to build MapReducebased dimensional ETL ﬂows very easily. The ETL process can be conﬁgured with only few lines of code. We will demonstrate the concrete steps in using ETLMR to load data into a (partly snowﬂaked) DW schema. This includes conﬁguration of data sources and targets, dimension processing schemes, fact processing, and employment. In addition, we also present the scalability on large data sets.

AB - This paper demonstrates ETLMR, a novel dimensional Extract–Transform–Load (ETL) programming framework that uses MapReduce to achieve scalability. ETLMR has builtin native support of data warehouse (DW) speciﬁc constructs such as star schemas, snowﬂake schemas, and slowly changing dimensions (SCDs). This makes it possible to build MapReducebased dimensional ETL ﬂows very easily. The ETL process can be conﬁgured with only few lines of code. We will demonstrate the concrete steps in using ETLMR to load data into a (partly snowﬂaked) DW schema. This includes conﬁguration of data sources and targets, dimension processing schemes, fact processing, and employment. In addition, we also present the scalability on large data sets.

UR - http://www.scopus.com/inward/record.url?scp=84872952928&partnerID=8YFLogxK

M3 - Conference article in Journal

SN - 2150-8097

VL - 5

SP - 1882

EP - 1885

JO - Proceedings of the VLDB Endowment

JF - Proceedings of the VLDB Endowment

IS - 12

T2 - International Conference on Very Large Data Bases

Y2 - 27 August 2012 through 31 August 2012

ER -

MapReduce-based Dimensional ETL Made Easy

Abstract

Konference

Adgang til dokumentet

AUB Link

Andre filer og links

Fingeraftryk

Citationsformater