Location Inference for Non-geotagged Tweets in User Timelines

Pengfei Li; Hua Lu; Nattiya Kanhabua; Sha Zhao; Gang Pan

doi:10.1109/TKDE.2018.2852764

Location Inference for Non-geotagged Tweets in User Timelines

Pengfei Li, Hua Lu, Nattiya Kanhabua, Sha Zhao, Gang Pan

Publikation: Bidrag til tidsskrift › Tidsskriftartikel › Forskning › peer review

27 Citationer (Scopus)

614 Downloads (Pure)

Abstract

Social media like Twitter have become globally popular in the past decade. Thanks to the high penetration of smartphones, social media users are increasingly going mobile. This trend has contributed to foster various location based services deployed on social media, the success of which heavily depends on the availability and accuracy of users' location information. However, only a very small fraction of tweets in Twitter are geo-tagged. Therefore, it is necessary to infer locations for tweets in order to attain the purpose of those location based services. In this paper, we tackle this problem by scrutinizing Twitter user timelines in a novel fashion. First of all, we split each user's tweet timeline temporally into a number of clusters, each tending to imply a distinct location. Subsequently, we adapt two machine learning models to our setting and design classifiers that classify each tweet cluster into one of the pre-defined location classes at the city level. The Bayes based model focuses on the information gain of words with location implications in the user-generated contents. The convolutional LSTM model treats user-generated contents and their associated locations as sequences and employs bidirectional LSTM and convolution operation to make location inferences. The two models are evaluated on a large set of real Twitter data. The experimental results suggest that our models are effective at inferring locations for non-geotagged tweets and the models outperform the state-of-the-art and alternative approaches significantly in terms of inference accuracy.

Originalsprog	Engelsk
Artikelnummer	8403245
Tidsskrift	I E E E Transactions on Knowledge & Data Engineering
Vol/bind	31
Udgave nummer	6
Sider (fra-til)	1150-1165
Antal sider	16
ISSN	1041-4347
DOI	https://doi.org/10.1109/TKDE.2018.2852764
Status	Udgivet - 2019

Adgang til dokumentet

10.1109/TKDE.2018.2852764

Green Open Access articleAccepteret manuskript, 3,31 MB

AUB Link

Søg efter materialet i Aalborg Universitetsbiblioteks søgemaskine

Andre filer og links

http://www.scopus.com/inward/record.url?scp=85049444046&partnerID=8YFLogxK

Citationsformater

@article{1fbabfa8d6374c818503958d9d2a6e66,

title = "Location Inference for Non-geotagged Tweets in User Timelines",

abstract = "Social media like Twitter have become globally popular in the past decade. Thanks to the high penetration of smartphones, social media users are increasingly going mobile. This trend has contributed to foster various location based services deployed on social media, the success of which heavily depends on the availability and accuracy of users' location information. However, only a very small fraction of tweets in Twitter are geo-tagged. Therefore, it is necessary to infer locations for tweets in order to attain the purpose of those location based services. In this paper, we tackle this problem by scrutinizing Twitter user timelines in a novel fashion. First of all, we split each user's tweet timeline temporally into a number of clusters, each tending to imply a distinct location. Subsequently, we adapt two machine learning models to our setting and design classifiers that classify each tweet cluster into one of the pre-defined location classes at the city level. The Bayes based model focuses on the information gain of words with location implications in the user-generated contents. The convolutional LSTM model treats user-generated contents and their associated locations as sequences and employs bidirectional LSTM and convolution operation to make location inferences. The two models are evaluated on a large set of real Twitter data. The experimental results suggest that our models are effective at inferring locations for non-geotagged tweets and the models outperform the state-of-the-art and alternative approaches significantly in terms of inference accuracy.",

keywords = "Adaptation models, Bayes, Feature extraction, Hidden Markov models, LSTM, Location Inference, Location awareness, Twitter, Urban areas",

author = "Pengfei Li and Hua Lu and Nattiya Kanhabua and Sha Zhao and Gang Pan",

year = "2019",

doi = "10.1109/TKDE.2018.2852764",

language = "English",

volume = "31",

pages = "1150--1165",

journal = "I E E E Transactions on Knowledge & Data Engineering",

issn = "1041-4347",

publisher = "IEEE",

number = "6",

}

TY - JOUR

T1 - Location Inference for Non-geotagged Tweets in User Timelines

AU - Li, Pengfei

AU - Lu, Hua

AU - Kanhabua, Nattiya

AU - Zhao, Sha

AU - Pan, Gang

PY - 2019

Y1 - 2019

N2 - Social media like Twitter have become globally popular in the past decade. Thanks to the high penetration of smartphones, social media users are increasingly going mobile. This trend has contributed to foster various location based services deployed on social media, the success of which heavily depends on the availability and accuracy of users' location information. However, only a very small fraction of tweets in Twitter are geo-tagged. Therefore, it is necessary to infer locations for tweets in order to attain the purpose of those location based services. In this paper, we tackle this problem by scrutinizing Twitter user timelines in a novel fashion. First of all, we split each user's tweet timeline temporally into a number of clusters, each tending to imply a distinct location. Subsequently, we adapt two machine learning models to our setting and design classifiers that classify each tweet cluster into one of the pre-defined location classes at the city level. The Bayes based model focuses on the information gain of words with location implications in the user-generated contents. The convolutional LSTM model treats user-generated contents and their associated locations as sequences and employs bidirectional LSTM and convolution operation to make location inferences. The two models are evaluated on a large set of real Twitter data. The experimental results suggest that our models are effective at inferring locations for non-geotagged tweets and the models outperform the state-of-the-art and alternative approaches significantly in terms of inference accuracy.

AB - Social media like Twitter have become globally popular in the past decade. Thanks to the high penetration of smartphones, social media users are increasingly going mobile. This trend has contributed to foster various location based services deployed on social media, the success of which heavily depends on the availability and accuracy of users' location information. However, only a very small fraction of tweets in Twitter are geo-tagged. Therefore, it is necessary to infer locations for tweets in order to attain the purpose of those location based services. In this paper, we tackle this problem by scrutinizing Twitter user timelines in a novel fashion. First of all, we split each user's tweet timeline temporally into a number of clusters, each tending to imply a distinct location. Subsequently, we adapt two machine learning models to our setting and design classifiers that classify each tweet cluster into one of the pre-defined location classes at the city level. The Bayes based model focuses on the information gain of words with location implications in the user-generated contents. The convolutional LSTM model treats user-generated contents and their associated locations as sequences and employs bidirectional LSTM and convolution operation to make location inferences. The two models are evaluated on a large set of real Twitter data. The experimental results suggest that our models are effective at inferring locations for non-geotagged tweets and the models outperform the state-of-the-art and alternative approaches significantly in terms of inference accuracy.

KW - Adaptation models

KW - Bayes

KW - Feature extraction

KW - Hidden Markov models

KW - LSTM

KW - Location Inference

KW - Location awareness

KW - Twitter

KW - Urban areas

UR - http://www.scopus.com/inward/record.url?scp=85049444046&partnerID=8YFLogxK

U2 - 10.1109/TKDE.2018.2852764

DO - 10.1109/TKDE.2018.2852764

M3 - Journal article

SN - 1041-4347

VL - 31

SP - 1150

EP - 1165

JO - I E E E Transactions on Knowledge & Data Engineering

JF - I E E E Transactions on Knowledge & Data Engineering

IS - 6

M1 - 8403245

ER -

Location Inference for Non-geotagged Tweets in User Timelines

Abstract

Adgang til dokumentet

AUB Link

Andre filer og links

Fingeraftryk

Citationsformater