Location Inference for Non-geotagged Tweets in User Timelines

Pengfei Li, Hua Lu, Nattiya Kanhabua, Sha Zhao, Gang Pan

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningpeer review

2 Citationer (Scopus)

Resumé

As social media users are increasingly going mobile, various location based services (LBS) have been deployed on social media like Twitter. The success of them heavily depends on the availability and accuracy of users' location information. However, only a small fraction of tweets are geo-tagged. Thus, it is necessary to infer locations for tweets in order to attain the purpose of LBS. In this paper, we tackle this problem by scrutinizing Twitter user timelines. First, we split each user's tweet timeline temporally into a number of clusters, each tending to imply a distinct location. Subsequently, we adapt two machine learning models and design classifiers that classify each tweet cluster into one of the pre-defined location classes at the city level. The Bayes based model focuses on the information gain of words with location implications in the user-generated contents. The convolutional LSTM model treats user-generated contents and their associated locations as sequences and employs bidirectional LSTM and convolution operation to infer locations. The two models are evaluated on a large real data set. The results suggest that our models are effective at inferring locations for non-geotagged tweets and the models outperform the state-of-the-art and alternative approaches significantly in terms of inference accuracy.

OriginalsprogEngelsk
TidsskriftI E E E Transactions on Knowledge & Data Engineering
ISSN1041-4347
DOI
StatusAccepteret/In press - 2019

Fingerprint

Location based services
Convolution
Learning systems
Classifiers
Availability

Citer dette

@article{1fbabfa8d6374c818503958d9d2a6e66,
title = "Location Inference for Non-geotagged Tweets in User Timelines",
abstract = "As social media users are increasingly going mobile, various location based services (LBS) have been deployed on social media like Twitter. The success of them heavily depends on the availability and accuracy of users' location information. However, only a small fraction of tweets are geo-tagged. Thus, it is necessary to infer locations for tweets in order to attain the purpose of LBS. In this paper, we tackle this problem by scrutinizing Twitter user timelines. First, we split each user's tweet timeline temporally into a number of clusters, each tending to imply a distinct location. Subsequently, we adapt two machine learning models and design classifiers that classify each tweet cluster into one of the pre-defined location classes at the city level. The Bayes based model focuses on the information gain of words with location implications in the user-generated contents. The convolutional LSTM model treats user-generated contents and their associated locations as sequences and employs bidirectional LSTM and convolution operation to infer locations. The two models are evaluated on a large real data set. The results suggest that our models are effective at inferring locations for non-geotagged tweets and the models outperform the state-of-the-art and alternative approaches significantly in terms of inference accuracy.",
keywords = "Adaptation models, Bayes, Feature extraction, Hidden Markov models, LSTM, Location Inference, Location awareness, Twitter, Urban areas",
author = "Pengfei Li and Hua Lu and Nattiya Kanhabua and Sha Zhao and Gang Pan",
year = "2019",
doi = "10.1109/TKDE.2018.2852764",
language = "English",
journal = "I E E E Transactions on Knowledge & Data Engineering",
issn = "1041-4347",
publisher = "IEEE",

}

Location Inference for Non-geotagged Tweets in User Timelines. / Li, Pengfei; Lu, Hua; Kanhabua, Nattiya; Zhao, Sha; Pan, Gang.

I: I E E E Transactions on Knowledge & Data Engineering, 2019.

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningpeer review

TY - JOUR

T1 - Location Inference for Non-geotagged Tweets in User Timelines

AU - Li, Pengfei

AU - Lu, Hua

AU - Kanhabua, Nattiya

AU - Zhao, Sha

AU - Pan, Gang

PY - 2019

Y1 - 2019

N2 - As social media users are increasingly going mobile, various location based services (LBS) have been deployed on social media like Twitter. The success of them heavily depends on the availability and accuracy of users' location information. However, only a small fraction of tweets are geo-tagged. Thus, it is necessary to infer locations for tweets in order to attain the purpose of LBS. In this paper, we tackle this problem by scrutinizing Twitter user timelines. First, we split each user's tweet timeline temporally into a number of clusters, each tending to imply a distinct location. Subsequently, we adapt two machine learning models and design classifiers that classify each tweet cluster into one of the pre-defined location classes at the city level. The Bayes based model focuses on the information gain of words with location implications in the user-generated contents. The convolutional LSTM model treats user-generated contents and their associated locations as sequences and employs bidirectional LSTM and convolution operation to infer locations. The two models are evaluated on a large real data set. The results suggest that our models are effective at inferring locations for non-geotagged tweets and the models outperform the state-of-the-art and alternative approaches significantly in terms of inference accuracy.

AB - As social media users are increasingly going mobile, various location based services (LBS) have been deployed on social media like Twitter. The success of them heavily depends on the availability and accuracy of users' location information. However, only a small fraction of tweets are geo-tagged. Thus, it is necessary to infer locations for tweets in order to attain the purpose of LBS. In this paper, we tackle this problem by scrutinizing Twitter user timelines. First, we split each user's tweet timeline temporally into a number of clusters, each tending to imply a distinct location. Subsequently, we adapt two machine learning models and design classifiers that classify each tweet cluster into one of the pre-defined location classes at the city level. The Bayes based model focuses on the information gain of words with location implications in the user-generated contents. The convolutional LSTM model treats user-generated contents and their associated locations as sequences and employs bidirectional LSTM and convolution operation to infer locations. The two models are evaluated on a large real data set. The results suggest that our models are effective at inferring locations for non-geotagged tweets and the models outperform the state-of-the-art and alternative approaches significantly in terms of inference accuracy.

KW - Adaptation models

KW - Bayes

KW - Feature extraction

KW - Hidden Markov models

KW - LSTM

KW - Location Inference

KW - Location awareness

KW - Twitter

KW - Urban areas

UR - http://www.scopus.com/inward/record.url?scp=85049444046&partnerID=8YFLogxK

U2 - 10.1109/TKDE.2018.2852764

DO - 10.1109/TKDE.2018.2852764

M3 - Journal article

JO - I E E E Transactions on Knowledge & Data Engineering

JF - I E E E Transactions on Knowledge & Data Engineering

SN - 1041-4347

ER -