Creating and Querying Data Cubes in Python using pyCube

Research output: Contribution to book/anthology/report/conference proceedingArticle in proceedingResearchpeer-review

Abstract

Abstract. Data cubes are used for analyzing large data sets usually
contained in data warehouses. The most popular data cube tools use
graphical user interfaces (GUI) to do the data analysis. Traditionally
this was necessary since data analysts were not expected to be technical
people. However, in the subsequent decades the data landscape changed
dramatically requiring companies to employ large teams of highly technical data scientists in order to manage and use the ever increasing amount
of data. These data scientists generally use tools like Python, interactive
notebooks, pandas, etc. while modern data cube tools are still GUI based.
To bridge this gap, this paper proposes a Python-based data cube tool
called pyCube. pyCube is able to semi-automatically create data cubes
for data stored in an RDBMS and manages the data cube metadata.
pyCube’s programmatic interface enables data scientists to query data
cubes by specifying the metadata of the desired result. pyCube is experimentally evaluated on Star Schema Benchmark (SSB). The results
show that pyCube vastly outperforms different implementations of SSB
queries in pandas in both runtime and memory while being easier to read
and write.
Original languageEnglish
Title of host publicationBig Data Analytics and Knowledge Discovery
Number of pages15
Publication date2024
Publication statusPublished - 2024

Fingerprint

Dive into the research topics of 'Creating and Querying Data Cubes in Python using pyCube'. Together they form a unique fingerprint.

Cite this