TY - GEN
T1 - Machine Learning Platform for Extreme Scale Computing on Compressed IoT Data
AU - Tirupathi, Seshu
AU - Salwala, Dhaval
AU - Zizzo, Giulio
AU - Rawat, Ambrish
AU - Purcell, Mark
AU - Jensen, Søren Kejser
AU - Thomsen, Christian
AU - Ho, Nguyen
AU - Cuza, Carlos E. Muniz
AU - Brusokas, Jonas
AU - Pedersen, Torben Bach
AU - Alexiou, Giorgos
AU - Giannopoulos, Giorgos
AU - Gidarakos, Panagiotis
AU - Kalimeris, Alexandros
AU - Maroulis, Stavros
AU - Papastefanatos, George
AU - Psarros, Ioannis
AU - Stamatopoulos, Vassilis
AU - Terrovitis, Manolis
PY - 2022/12/20
Y1 - 2022/12/20
N2 - With the lowering costs of sensors, high-volume and high-velocity data are increasingly being generated and analyzed, especially in IoT domains like energy and smart homes. Consequently, applications that require accurate short-term forecasts and predictions are also steadily increasing. In this paper, we provide an overview of a novel end-to-end platform that provides efficient ingestion, compression, transfer, query processing, and machine learning-based analytics for high-frequency and high-volume time series from IoT. The performance of the platform is evaluated using real-world dataset from RES installations. The results show the importance of high-frequency analytics and the surprisingly positive impact of error bounded lossy compression on machine learning in the form of AutoML. For example, when detecting yaw misalignments in wind turbines, an improvement of 9% in accuracy was observed for AutoML models on lossy compressed data compared to the current industry standard of 10-minute aggregated data. Thus, these small-scale experiments show the potential of the platform, and larger pilots are planned.
AB - With the lowering costs of sensors, high-volume and high-velocity data are increasingly being generated and analyzed, especially in IoT domains like energy and smart homes. Consequently, applications that require accurate short-term forecasts and predictions are also steadily increasing. In this paper, we provide an overview of a novel end-to-end platform that provides efficient ingestion, compression, transfer, query processing, and machine learning-based analytics for high-frequency and high-volume time series from IoT. The performance of the platform is evaluated using real-world dataset from RES installations. The results show the importance of high-frequency analytics and the surprisingly positive impact of error bounded lossy compression on machine learning in the form of AutoML. For example, when detecting yaw misalignments in wind turbines, an improvement of 9% in accuracy was observed for AutoML models on lossy compressed data compared to the current industry standard of 10-minute aggregated data. Thus, these small-scale experiments show the potential of the platform, and larger pilots are planned.
KW - Big Data
KW - Data models
KW - Industries
KW - Machine learning
KW - Query processing
KW - Smart homes
KW - Time series analysis
KW - Lossy Data Compression
KW - Machine Learning
KW - Cloud
KW - Lossless Data Compression
KW - Edge
KW - Renewable Energy Sources
UR - http://www.scopus.com/inward/record.url?scp=85147976313&partnerID=8YFLogxK
U2 - 10.1109/BigData55660.2022.10020540
DO - 10.1109/BigData55660.2022.10020540
M3 - Article in proceeding
SN - 978-1-6654-8046-8
SP - 3179
EP - 3185
BT - 2022 IEEE International Conference on Big Data (Big Data)
A2 - Tsumoto, Shusaku
A2 - Ohsawa, Yukio
A2 - Chen, Lei
A2 - Van den Poel, Dirk
A2 - Hu, Xiaohua
A2 - Motomura, Yoichi
A2 - Takagi, Takuya
A2 - Wu, Lingfei
A2 - Xie, Ying
A2 - Abe, Akihiro
A2 - Raghavan, Vijay
PB - IEEE Communications Society
T2 - 2022 IEEE International Conference on Big Data (Big Data)
Y2 - 17 December 2022 through 20 December 2022
ER -