TY - GEN
T1 - How to Search the Internet Archive Without Indexing It
AU - Kanhabua, Nattiya
AU - Kemkes, Philipp
AU - Nejdl, Wolfgang
AU - Nguyen, Tu Ngoc
AU - Reis, Felipe
AU - Tran, Nam Khanh
PY - 2016
Y1 - 2016
N2 - Significant parts of our cultural heritage are produced on the Web in recent years. While the easy accessibility to the current Web is a good baseline, optimal access to the past of the Web faces several challenges. This includes dealing with large-scale web archive collections, as well as lacking of usage logs, which contain implicit human feedback most relevant for today’s web search. In this paper, we propose an entity-oriented search system to support retrieval and analysis processes on web archives. We use Bing, searching the current Web, to retrieve a ranked list of results, and we link our search results to the WayBack Machine; thus al- lowing keyword search on the Internet Archive without processing and indexing its raw content. Our system complements existing web archive search tools through a user interface, which comes close to the functionalities of modern web search engines (e.g., keyword search, query auto completion and related query suggestion), plus the huge benefit of taking user feedback on the current Web into account also for Web Archive search. Through extensive experiments, we conduct quantitative and qualitative analyses in order to provide insights that enable further research on and practical applications of web archives.
AB - Significant parts of our cultural heritage are produced on the Web in recent years. While the easy accessibility to the current Web is a good baseline, optimal access to the past of the Web faces several challenges. This includes dealing with large-scale web archive collections, as well as lacking of usage logs, which contain implicit human feedback most relevant for today’s web search. In this paper, we propose an entity-oriented search system to support retrieval and analysis processes on web archives. We use Bing, searching the current Web, to retrieve a ranked list of results, and we link our search results to the WayBack Machine; thus al- lowing keyword search on the Internet Archive without processing and indexing its raw content. Our system complements existing web archive search tools through a user interface, which comes close to the functionalities of modern web search engines (e.g., keyword search, query auto completion and related query suggestion), plus the huge benefit of taking user feedback on the current Web into account also for Web Archive search. Through extensive experiments, we conduct quantitative and qualitative analyses in order to provide insights that enable further research on and practical applications of web archives.
U2 - 10.1007/978-3-319-43997-6_12
DO - 10.1007/978-3-319-43997-6_12
M3 - Article in proceeding
SN - 978-3-319-43996-9
T3 - Lecture Notes in Computer Science
SP - 147
EP - 160
BT - Research and Advanced Technology for Digital Libraries
PB - Springer
T2 - 20th International Conference on Theory and Practice of Digital Libraries
Y2 - 5 September 2016 through 9 September 2016
ER -