SETL: A programmable semantic extract-transform-load framework for semantic data warehouses

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningpeer review

9 Citationer (Scopus)

Resumé

In order to create better decisions for business analytics, organizations increasingly use external structured, semi-structured, and unstructured data in addition to the (mostly structured) internal data. Current Extract-Transform-Load (ETL) tools are not suitable for this “open world scenario” because they do not consider semantic issues in the integration processing. Current ETL tools neither support processing semantic data nor create a semantic Data Warehouse (DW), a repository of semantically integrated data. This paper describes our programmable Semantic ETL (SETL) framework. SETL builds on Semantic Web (SW) standards and tools and supports developers by offering a number of powerful modules, classes, and methods for (dimensional and semantic) DW constructs and tasks. Thus it supports semantic data sources in addition to traditional data sources, semantic integration, and creating or publishing a semantic (multidimensional) DW in terms of a knowledge base. A comprehensive experimental evaluation comparing SETL to a solution made with traditional tools (requiring much more hand-coding) on a concrete use case, shows that SETL provides better programmer productivity, knowledge base quality, and performance.
OriginalsprogEngelsk
TidsskriftInformation Systems
Vol/bind68
Sider (fra-til)17-43
ISSN0306-4379
DOI
StatusUdgivet - 4 mar. 2017

Fingerprint

Data warehouses
Semantics
Mathematical transformations
Semantic Web
Processing
Productivity

Citer dette

@article{5c6fc0feeb764a34a8d5b0bc367bf428,
title = "SETL: A programmable semantic extract-transform-load framework for semantic data warehouses",
abstract = "In order to create better decisions for business analytics, organizations increasingly use external structured, semi-structured, and unstructured data in addition to the (mostly structured) internal data. Current Extract-Transform-Load (ETL) tools are not suitable for this “open world scenario” because they do not consider semantic issues in the integration processing. Current ETL tools neither support processing semantic data nor create a semantic Data Warehouse (DW), a repository of semantically integrated data. This paper describes our programmable Semantic ETL (SETL) framework. SETL builds on Semantic Web (SW) standards and tools and supports developers by offering a number of powerful modules, classes, and methods for (dimensional and semantic) DW constructs and tasks. Thus it supports semantic data sources in addition to traditional data sources, semantic integration, and creating or publishing a semantic (multidimensional) DW in terms of a knowledge base. A comprehensive experimental evaluation comparing SETL to a solution made with traditional tools (requiring much more hand-coding) on a concrete use case, shows that SETL provides better programmer productivity, knowledge base quality, and performance.",
keywords = "ETL, RDF, Semantic Integration , data warehouses, Semantic-aware Knowledge base",
author = "Rudra Nath and Katja Hose and Pedersen, {Torben Bach} and Oscar Romero",
year = "2017",
month = "3",
day = "4",
doi = "10.1016/j.is.2017.01.005",
language = "English",
volume = "68",
pages = "17--43",
journal = "Information Systems",
issn = "0306-4379",
publisher = "Pergamon Press",

}

SETL : A programmable semantic extract-transform-load framework for semantic data warehouses. / Nath, Rudra; Hose, Katja; Pedersen, Torben Bach; Romero, Oscar.

I: Information Systems, Bind 68, 04.03.2017, s. 17-43.

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningpeer review

TY - JOUR

T1 - SETL

T2 - A programmable semantic extract-transform-load framework for semantic data warehouses

AU - Nath, Rudra

AU - Hose, Katja

AU - Pedersen, Torben Bach

AU - Romero, Oscar

PY - 2017/3/4

Y1 - 2017/3/4

N2 - In order to create better decisions for business analytics, organizations increasingly use external structured, semi-structured, and unstructured data in addition to the (mostly structured) internal data. Current Extract-Transform-Load (ETL) tools are not suitable for this “open world scenario” because they do not consider semantic issues in the integration processing. Current ETL tools neither support processing semantic data nor create a semantic Data Warehouse (DW), a repository of semantically integrated data. This paper describes our programmable Semantic ETL (SETL) framework. SETL builds on Semantic Web (SW) standards and tools and supports developers by offering a number of powerful modules, classes, and methods for (dimensional and semantic) DW constructs and tasks. Thus it supports semantic data sources in addition to traditional data sources, semantic integration, and creating or publishing a semantic (multidimensional) DW in terms of a knowledge base. A comprehensive experimental evaluation comparing SETL to a solution made with traditional tools (requiring much more hand-coding) on a concrete use case, shows that SETL provides better programmer productivity, knowledge base quality, and performance.

AB - In order to create better decisions for business analytics, organizations increasingly use external structured, semi-structured, and unstructured data in addition to the (mostly structured) internal data. Current Extract-Transform-Load (ETL) tools are not suitable for this “open world scenario” because they do not consider semantic issues in the integration processing. Current ETL tools neither support processing semantic data nor create a semantic Data Warehouse (DW), a repository of semantically integrated data. This paper describes our programmable Semantic ETL (SETL) framework. SETL builds on Semantic Web (SW) standards and tools and supports developers by offering a number of powerful modules, classes, and methods for (dimensional and semantic) DW constructs and tasks. Thus it supports semantic data sources in addition to traditional data sources, semantic integration, and creating or publishing a semantic (multidimensional) DW in terms of a knowledge base. A comprehensive experimental evaluation comparing SETL to a solution made with traditional tools (requiring much more hand-coding) on a concrete use case, shows that SETL provides better programmer productivity, knowledge base quality, and performance.

KW - ETL

KW - RDF

KW - Semantic Integration

KW - data warehouses

KW - Semantic-aware Knowledge base

UR - http://www.sciencedirect.com/science/article/pii/S0306437916302101

U2 - 10.1016/j.is.2017.01.005

DO - 10.1016/j.is.2017.01.005

M3 - Journal article

VL - 68

SP - 17

EP - 43

JO - Information Systems

JF - Information Systems

SN - 0306-4379

ER -