How to Search the Internet Archive Without Indexing It

Nattiya Kanhabua, Philipp Kemkes, Wolfgang Nejdl, Tu Ngoc Nguyen, Felipe Reis, Nam Khanh Tran

Publikation: Bidrag til bog/antologi/rapport/konference proceedingKonferenceartikel i proceedingForskningpeer review

14 Citationer (Scopus)
441 Downloads (Pure)

Abstract

Significant parts of our cultural heritage are produced on the Web in recent years. While the easy accessibility to the current Web is a good baseline, optimal access to the past of the Web faces several challenges. This includes dealing with large-scale web archive collections, as well as lacking of usage logs, which contain implicit human feedback most relevant for today’s web search. In this paper, we propose an entity-oriented search system to support retrieval and analysis processes on web archives. We use Bing, searching the current Web, to retrieve a ranked list of results, and we link our search results to the WayBack Machine; thus al- lowing keyword search on the Internet Archive without processing and indexing its raw content. Our system complements existing web archive search tools through a user interface, which comes close to the functionalities of modern web search engines (e.g., keyword search, query auto completion and related query suggestion), plus the huge benefit of taking user feedback on the current Web into account also for Web Archive search. Through extensive experiments, we conduct quantitative and qualitative analyses in order to provide insights that enable further research on and practical applications of web archives.
OriginalsprogEngelsk
TitelResearch and Advanced Technology for Digital Libraries : 20th International Conference on Theory and Practice of Digital Libraries, TPDL 2016, Hannover, Germany, September 5–9, 2016, Proceedings
ForlagSpringer
Publikationsdato2016
Sider147-160
ISBN (Trykt)978-3-319-43996-9
ISBN (Elektronisk)978-3-319-43997-6
DOI
StatusUdgivet - 2016
Begivenhed20th International Conference on Theory and Practice of Digital Libraries - Hannover, Tyskland
Varighed: 5 sep. 20169 sep. 2016

Konference

Konference20th International Conference on Theory and Practice of Digital Libraries
Land/OmrådeTyskland
ByHannover
Periode05/09/201609/09/2016
NavnLecture Notes in Computer Science
Vol/bind9819
ISSN0302-9743

Fingeraftryk

Dyk ned i forskningsemnerne om 'How to Search the Internet Archive Without Indexing It'. Sammen danner de et unikt fingeraftryk.

Citationsformater