Abstract
The widespread deployment of IoT systems in the real world today has enabled the generation and collection of an enormous amount of sensor times series. One of the important mining techniques to extract patterns from time series is temporal pattern mining (TPM). Unlike the sequential pattern mining, TPM adds an additional temporal dimension, i.e., time intervals, into extracted patterns, making them more informative. However, adding the extra temporal dimension into patterns results in an additional exponential factor to the growth of the search space, and thus, significantly increases the mining complexity. Current TPM approaches work sequentially, therefore, cannot scale to large datasets. In this paper, we propose Distributed Hierarchical Pattern Graph TPM (DHPG-TPM), the first distributed solution that supports large-scale TPM using the leading distributed platform Apache Spark. Moreover, DHPG-TPM employs efficient data structures, distributed bitmap and distributed Hierarchical Pattern Graph that are carefully designed to work efficiently in a distributed environment to enable fast computations of support and confidence. To address the exponential search space of TPM, we design effective distributed pruning techniques based on the Apriori principle and the transitivity property of temporal relations to reduce the search space while minimizing the communication overhead between the cluster nodes. We con- duct extensive experiments on real-world and synthetic datasets, showing that DHPG-TPM outperforms the sequential baselines and scales to very large datasets.
Original language | English |
---|---|
Title of host publication | 2021 IEEE International Conference on Big Data (Big Data) |
Publisher | IEEE (Institute of Electrical and Electronics Engineers) |
Publication date | 7 Dec 2021 |
Article number | 9671753 |
ISBN (Print) | 978-1-6654-4599-3 |
ISBN (Electronic) | 978-1-6654-3902-2 |
DOIs | |
Publication status | Published - 7 Dec 2021 |
Event | 2021 IEEE International Conference on Big Data - Virtual Event Duration: 15 Dec 2021 → 18 Dec 2021 Conference number: 9 https://bigdataieee.org/BigData2021/index.html |
Conference
Conference | 2021 IEEE International Conference on Big Data |
---|---|
Number | 9 |
Location | Virtual Event |
Period | 15/12/2021 → 18/12/2021 |
Internet address |
Keywords
- temporal patterns
- distributed artificial intelligence