TY - GEN
T1 - Missing Value Imputation for Multi-attribute Sensor Data Streams via Message Propagation
AU - Li, Xiao
AU - Li, Huan
AU - Lu, Hua
AU - Jensen, Christian S.
AU - Pandey, Varun
AU - Markl, Volker
PY - 2023
Y1 - 2023
N2 - Sensor data streams occur widely in various real-time applications in the context of the Internet of Things (IoT). However, sensor data streams feature missing values due to factors such as sensor failures, communication errors, or depleted batteries. Missing values can compromise the quality of real-time analytics tasks and downstream applications. Existing imputation methods either make strong assumptions about streams or have low efficiency. In this study, we aim to accurately and efficiently impute missing values in data streams that satisfy only general characteristics in order to benefit real-time applications more widely. First, we propose a message propagation imputation network (MPIN) that is able to recover the missing values of data instances in a time window. We give a theoretical analysis of why MPIN is effective. Second, we present a continuous imputation framework that consists of data update and model update mechanisms to enable MPIN to perform continuous imputation both effectively and efficiently. Extensive experiments on multiple real datasets show that MPIN can outperform the existing data imputers by wide margins and that the continuous imputation framework is efficient and accurate.
AB - Sensor data streams occur widely in various real-time applications in the context of the Internet of Things (IoT). However, sensor data streams feature missing values due to factors such as sensor failures, communication errors, or depleted batteries. Missing values can compromise the quality of real-time analytics tasks and downstream applications. Existing imputation methods either make strong assumptions about streams or have low efficiency. In this study, we aim to accurately and efficiently impute missing values in data streams that satisfy only general characteristics in order to benefit real-time applications more widely. First, we propose a message propagation imputation network (MPIN) that is able to recover the missing values of data instances in a time window. We give a theoretical analysis of why MPIN is effective. Second, we present a continuous imputation framework that consists of data update and model update mechanisms to enable MPIN to perform continuous imputation both effectively and efficiently. Extensive experiments on multiple real datasets show that MPIN can outperform the existing data imputers by wide margins and that the continuous imputation framework is efficient and accurate.
U2 - 10.14778/3632093.3632100
DO - 10.14778/3632093.3632100
M3 - Conference article in Journal
SN - 2150-8097
VL - 17
SP - 345
EP - 358
JO - Proceedings of the VLDB Endowment
JF - Proceedings of the VLDB Endowment
IS - 3
T2 - 50th International Conference on Very Large Data Bases
Y2 - 25 August 2024 through 29 August 2024
ER -