An adaptive information-theoretic approach for identifying temporal correlations in big data sets

Nguyen Ho, Huy Vo, Mai Vu

Research output: Contribution to journalConference article in JournalResearchpeer-review

8 Citations (Scopus)

Abstract

In the past two decades, new developments in computing, sensing and crowdsourced data have resulted in an explosion in the availability of quantitative information. The possibilities of analyzing this so-called 'big data' to inform research and the decision-making process are virtually endless. In general analyses have to be done across multiple data sets in order to bring out the most value of big data. A first important step is to identify temporal correlations between data sets. Given the characteristics of big data in term of volume and velocity, techniques that identify correlations not only need to be scalable, but also need to help users in ordering the correlation across temporal resolutions so that they can focus on important relationships. There is a large body of work in this area, however, most of them either only deal with small data sets, using a fixed temporal resolution, or does not provide a quantifiable measure of a correlation significance. In this paper, we present a method based on mutual information to identify correlations in large data sets. Discovered correlations are suggested to users in an order based on their significance. Our method supports an adaptive streaming technique that minimizes duplicated computation and is implemented on top of Apache Spark for scalability using big data platforms. We also provide a comprehensive evaluation using real-world data sets from NYC Open Data, and compare our findings against a recent study.

Original languageEnglish
JournalProceedings - 2016 IEEE International Conference on Big Data, Big Data 2016
Pages (from-to)666-675
Number of pages10
DOIs
Publication statusPublished - 2016
Externally publishedYes
Event4th IEEE International Conference on Big Data, Big Data 2016 - Washington, United States
Duration: 5 Dec 20168 Dec 2016

Conference

Conference4th IEEE International Conference on Big Data, Big Data 2016
Country/TerritoryUnited States
CityWashington
Period05/12/201608/12/2016
SponsorCisco, et al., Huawei Technologies Co., Ltd., IEEE, IEEE Computer Society, National Science Foundation (NSF)

Bibliographical note

Publisher Copyright:
© 2016 IEEE.

Keywords

  • adaptive sliding window
  • Big Data
  • mutual information
  • streaming
  • temporal correlation

Fingerprint

Dive into the research topics of 'An adaptive information-theoretic approach for identifying temporal correlations in big data sets'. Together they form a unique fingerprint.

Cite this