TY - GEN
T1 - Towards longitudinal analytics on social media data
AU - Xia, Fan
AU - Yang, Bin
AU - Yu, Chengcheng
AU - Qian, Weining
AU - Zhou, Aoying
PY - 2019/4
Y1 - 2019/4
N2 - We are witnessing increasing interests in longitudinal analytics on social media data. Longitudinal analytics takes into account an interval and considers the temporal popularity of social media data in the interval, rather than only considering recently generated social media data in real-time search. We study a fundamental functionality in longitudinal analytics - the top-k temporal keyword (TkTK) querying. A TkTK query takes as input a set of query keywords and an interval, and returns the top-k most significant social items, e.g., tweets, where the significance of a social item is defined based on a combination of the textual relevance and temporal popularity. We model social media data as a forest of linkage trees along the time dimension, which well models the propagation processes, e.g., replies and forwards, among different social items. Based on the forest, we model the temporal popularity of a social item across time as a popularity time series. We design two indexing structures that index social items' popularity time series and textual content in a holistic manner - the temporal popularity inverted index (TPII) and the log-structured merge octree (LSMO). Empirical studies with two substantial social media data sets offer insight into the design properties of the indexes and confirm that LSMO enables both efficient query processing and indexing structure updates.
AB - We are witnessing increasing interests in longitudinal analytics on social media data. Longitudinal analytics takes into account an interval and considers the temporal popularity of social media data in the interval, rather than only considering recently generated social media data in real-time search. We study a fundamental functionality in longitudinal analytics - the top-k temporal keyword (TkTK) querying. A TkTK query takes as input a set of query keywords and an interval, and returns the top-k most significant social items, e.g., tweets, where the significance of a social item is defined based on a combination of the textual relevance and temporal popularity. We model social media data as a forest of linkage trees along the time dimension, which well models the propagation processes, e.g., replies and forwards, among different social items. Based on the forest, we model the temporal popularity of a social item across time as a popularity time series. We design two indexing structures that index social items' popularity time series and textual content in a holistic manner - the temporal popularity inverted index (TPII) and the log-structured merge octree (LSMO). Empirical studies with two substantial social media data sets offer insight into the design properties of the indexes and confirm that LSMO enables both efficient query processing and indexing structure updates.
KW - Social media data
KW - Temporal keyword query
KW - Time series
UR - http://www.scopus.com/inward/record.url?scp=85067990111&partnerID=8YFLogxK
U2 - 10.1109/ICDE.2019.00039
DO - 10.1109/ICDE.2019.00039
M3 - Article in proceeding
AN - SCOPUS:85067990111
SN - 978-1-5386-7475-8
T3 - Proceedings of the International Conference on Data Engineering
SP - 350
EP - 361
BT - Proceedings - 2019 IEEE 35th International Conference on Data Engineering, ICDE 2019
PB - IEEE
T2 - 35th IEEE International Conference on Data Engineering, ICDE 2019
Y2 - 8 April 2019 through 11 April 2019
ER -