TWikiL - The Twitter Wikipedia Link Dataset

Research output: Contribution to book/anthology/report/conference proceedingArticle in proceedingResearchpeer-review

175 Downloads (Pure)

Abstract

Recent research has shown how strongly connected Wikipedia and other web applications are. For example, search engines rely heavily on surfacing Wikipedia links to satisfy their users' information needs and volunteer-created Wikipedia content frequently gets re-used on social media platforms like Reddit. However, publicly accessible datasets that enable researchers to study the interrelationship between Wikipedia and other platforms are sparse. In addition to that, most studies only focus on certain points in time and do not consider the historical perspective. To begin solving these problems, we developed TWikiL, the Twitter Wikipedia Link Dataset, which contains all Wikipedia links posted on Twitter in the period 2006 to January 2021. We extract Wikipedia links from Tweets and enrich the referenced articles with their respective Wikidata identifiers and Wikipedia topic categories. This makes the dataset immediately useful for a large range of scholarly use cases. In this paper, we describe the data collection process, perform an initial exploratory analysis and present a comprehensive overview of how this dataset can be useful for the research community.
Original languageEnglish
Title of host publicationProceedings of ICWSM 2022
Number of pages1301
Publication date2022
Pages1292
Article number16(1)
DOIs
Publication statusPublished - 2022

Keywords

  • Twitter
  • Wikipedia
  • Cross-platform analysis
  • Open Data

Fingerprint

Dive into the research topics of 'TWikiL - The Twitter Wikipedia Link Dataset'. Together they form a unique fingerprint.

Cite this