TY - GEN
T1 - Scalable Model-Based Management of Correlated Dimensional Time Series in ModelarDB+
AU - Jensen, Søren Kejser
AU - Pedersen, Torben Bach
AU - Thomsen, Christian
N1 - Conference code: 2021
PY - 2021/4/22
Y1 - 2021/4/22
N2 - To monitor critical infrastructure, high quality sensors sampled at a high frequency are increasingly used. However, as they produce huge amounts of data, only simple aggregates are stored. This removes outliers and fluctuations that could indicate problems. As a remedy, we present a model-based approach for managing time series with dimensions that exploits correlation in and among time series. Specifically, we propose compressing groups of correlated time series using an extensible set of model types within a user-defined error bound (possibly zero). We name this new category of model-based compression methods for time series Multi-Model Group Compression (MMGC). We present the first MMGC method GOLEMM and extend model types to compress time series groups. We propose primitives for users to effectively define groups for differently sized data sets, and based on these, an automated grouping method using only the time series dimensions. We propose algorithms for executing simple and multi-dimensional aggregate queries on models. Last, we implement our methods in the Time Series Management System (TSMS) ModelarDB (ModelarDB+). Our evaluation shows that compared to widely used formats, ModelarDB+ provides up to 13.7x faster ingestion due to high compression, 113x better compression due to the adaptivity of GOLEMM, 573x faster aggregates by using models, and close to linear scalability. It is also extensible and supports online query processing.
AB - To monitor critical infrastructure, high quality sensors sampled at a high frequency are increasingly used. However, as they produce huge amounts of data, only simple aggregates are stored. This removes outliers and fluctuations that could indicate problems. As a remedy, we present a model-based approach for managing time series with dimensions that exploits correlation in and among time series. Specifically, we propose compressing groups of correlated time series using an extensible set of model types within a user-defined error bound (possibly zero). We name this new category of model-based compression methods for time series Multi-Model Group Compression (MMGC). We present the first MMGC method GOLEMM and extend model types to compress time series groups. We propose primitives for users to effectively define groups for differently sized data sets, and based on these, an automated grouping method using only the time series dimensions. We propose algorithms for executing simple and multi-dimensional aggregate queries on models. Last, we implement our methods in the Time Series Management System (TSMS) ModelarDB (ModelarDB+). Our evaluation shows that compared to widely used formats, ModelarDB+ provides up to 13.7x faster ingestion due to high compression, 113x better compression due to the adaptivity of GOLEMM, 573x faster aggregates by using models, and close to linear scalability. It is also extensible and supports online query processing.
UR - http://www.scopus.com/inward/record.url?scp=85112868430&partnerID=8YFLogxK
U2 - 10.1109/ICDE51399.2021.00123
DO - 10.1109/ICDE51399.2021.00123
M3 - Article in proceeding
SN - 978-1-7281-9185-0
T3 - Proceedings of the International Conference on Data Engineering
SP - 1380
EP - 1391
BT - Proceedings of the 37th IEEE International Conference on Data Engineering
PB - IEEE (Institute of Electrical and Electronics Engineers)
T2 - 37th International Conference on Data Engineering
Y2 - 19 April 2021 through 22 April 2021
ER -