Abstract
Industrial systems, e.g., wind turbines, generate big amounts of data from reliable sensors with high velocity. As it is unfeasible to store and query such big amounts of data, only simple aggregates are currently stored. However, aggregates remove fluctuations and outliers that can reveal underlying problems and limit the knowledge to be gained from historical data. As a remedy, we present the distributed Time Series Management System (TSMS) ModelarDB that uses models to store sensor data. We thus propose an online, adaptive multi-model compression algorithm that maintains data values within a user-defined error bound (possibly zero). We also propose (i) a database schema to store time series as models, (ii) methods to push-down predicates to a key-value store utilizing this schema, (iii) optimized methods to execute aggregate queries on models, (iv) a method to optimize execution of projections through static code-generation, and (v) dynamic extensibility that allows new models to be used without recompiling the TSMS. Further, we present a general modular distributed TSMS architecture and its implementation, ModelarDB, as a portable library, using Apache Spark for query processing and Apache Cassandra for storage. An experimental evaluation shows that, unlike current systems, ModelarDB hits a sweet spot and offers fast ingestion, good compression, and fast, scalable online aggregate query processing at the same time. This is achieved by dynamically adapting to data sets using multiple models. The system degrades gracefully as more outliers occur and the actual errors are much lower than the bounds.
| Original language | English |
|---|---|
| Journal | Proceedings of the VLDB Endowment |
| Volume | 11 |
| Issue number | 11 |
| Pages (from-to) | 1688-1701 |
| Number of pages | 14 |
| ISSN | 2150-8097 |
| DOIs | |
| Publication status | Published - 1 Jul 2018 |
Fingerprint
Dive into the research topics of 'ModelarDB: Modular Model-based Time Series Management with Spark and Cassandra'. Together they form a unique fingerprint.-
ModelarDB: Integrated Model-Based Management of Time Series from Edge to Cloud
Jensen, S. K., Thomsen, C. & Pedersen, T. B., 9 Feb 2023, Transactions on Large-Scale Data- and Knowledge-Centered Systems LIII. Hameurlain, A. & Tjoa, A. M. (eds.). Springer, p. 1-33 33 p. (Transactions on Large-Scale Data- and Knowledge-Centered Systems). (Lecture Notes in Computer Science, Vol. 13840).Research output: Contribution to book/anthology/report/conference proceeding › Book chapter › Research › peer-review
Open AccessFile139 Downloads (Pure) -
Machine Learning Platform for Extreme Scale Computing on Compressed IoT Data
Tirupathi, S., Salwala, D., Zizzo, G., Rawat, A., Purcell, M., Jensen, S. K., Thomsen, C., Ho, N., Cuza, C. E. M., Brusokas, J., Pedersen, T. B., Alexiou, G., Giannopoulos, G., Gidarakos, P., Kalimeris, A., Maroulis, S., Papastefanatos, G., Psarros, I., Stamatopoulos, V. & Terrovitis, M., 20 Dec 2022, 2022 IEEE International Conference on Big Data (Big Data). Tsumoto, S., Ohsawa, Y., Chen, L., Van den Poel, D., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A. & Raghavan, V. (eds.). IEEE Communications Society, p. 3179-3185 7 p. 10020540Research output: Contribution to book/anthology/report/conference proceeding › Article in proceeding › Research › peer-review
6 Link opens in a new tab Citations (Scopus) -
Time Series Management Systems: A 2022 Survey
Jensen, S. K., Pedersen, T. B. & Thomsen, C., 4 Dec 2022, (Accepted/In press) Data Series Management and Analytics. Palpanas, T. & Zoumpatianos, K. (eds.). Association for Computing Machinery (ACM), 81 p.Research output: Contribution to book/anthology/report/conference proceeding › Book chapter › Research › peer-review
Open AccessFile
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver