Hyphe, a curation-oriented approach to web crawling for the social sciences

Mathieu Jacomy, Paul Girard, Benjamin Ooghe-Tabanou, Tommaso Venturini

Publikation: Bidrag til bog/antologi/rapport/konference proceedingKonferenceartikel i proceedingForskningpeer review

17 Citationer (Scopus)

Abstract

The web is a field of investigation for social sciences, and platform-based studies have long proven their relevance. However the generic web is rarely studied in itself though it contains crucial aspects of the embodiment of social actors: personal blogs, institutional websites, hobby-specific media? We realized that some sociologists see existing web crawlers as "black boxes" unsuitable for research though they are willing to study the broad web. In this paper we present Hyphe, a crawler developed with and for social scientists, with an innovative "curation-oriented" approach. We expose the problems of using web-mining techniques in social science research and how to overcome those by specific features such as step-by-step corpus building and a memory structure allowing researchers to redefine dynamically the granularity of their "web entities".

OriginalsprogEngelsk
TitelProceedings of the 10th International Conference on Web and Social Media, ICWSM 2016
Antal sider4
ForlagAAAI Press
Publikationsdato1 jan. 2016
Sider595-598
ISBN (Elektronisk)9781577357582
StatusUdgivet - 1 jan. 2016
Udgivet eksterntJa
Begivenhed10th International Conference on Web and Social Media, ICWSM 2016 - Cologne, Tyskland
Varighed: 17 maj 201620 maj 2016

Konference

Konference10th International Conference on Web and Social Media, ICWSM 2016
Land/OmrådeTyskland
ByCologne
Periode17/05/201620/05/2016
SponsorAssociation for the Advancement of Artificial Intelligence (AAAI)
NavnProceedings of the 10th International Conference on Web and Social Media, ICWSM 2016

Fingeraftryk

Dyk ned i forskningsemnerne om 'Hyphe, a curation-oriented approach to web crawling for the social sciences'. Sammen danner de et unikt fingeraftryk.

Citationsformater