Abstract
Large amounts of spatial, textual, and temporal (STT) data are being produced daily. This is data containing an unstructured component (text), a spatial component (geographic position), and a time component (timestamp). Therefore, there is a need for a powerful and general way of analyzing STT data together. In this paper, we define and formalize the Spatio-Textual-Temporal Cube (STTCube) structure to enable combined effective and efficient analytical queries over STT data. Our novel data model over STT objects enables novel joint and integrated STT insights that are hard to obtain using existing methods. Furthermore, our proposed STTCube Incremental Maintenance (IMstt) method maintains the already constructed STTCube efficiently when new data arrives. Moreover, we introduce the new concept of STT measures with associated novel STT-OLAP operators. To allow for efficient large-scale analytics, we present a pre-aggregation framework for exact and approximate computation of STT measures. Our comprehensive experimental evaluation on a real-world Twitter dataset confirms that our proposed methods reduce query response time by 1–5 orders of magnitude compared to the No Materialization baseline and decrease storage cost between 97% and 99.9% compared to the Full Materialization baseline while adding only a negligible overhead in the STTCube construction time. Moreover, approximate computation achieves an accuracy between 90% and 100% while reducing query response time by 3–5 orders of magnitude compared to No Materialization and IMstt achieves an order of magnitude improvement in maintenance time compared to the baseline maintenance method.
Original language | English |
---|---|
Article number | 102009 |
Journal | Information Systems |
Volume | 108 |
ISSN | 0306-4379 |
DOIs | |
Publication status | Published - Sept 2022 |
Bibliographical note
Funding Information:This research is partially funded by the European Commission through the Erasmus Mundus Joint Doctorate Information Technologies for Business Intelligence ( EM IT4BI-DC ), the Danish Council for Independent Research (DFF) under grant agreement no. DFF-8048-00051B , Aalborg University’s Talent Programme , and the Poul Due Jensen Foundation . Furthermore, Matteo Lissandrini is supported by the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 838216.
Publisher Copyright:
© 2022 The Authors
Keywords
- Data cube
- OLAP
- Spatial analytics
- Spatial-textual-temporal measures
- Spatio-textual-temporal data
- Textual analytics