Estimating Search Engine Index Size Variability: A 9-Year Longitudinal Study

Antal Van den Bosch; Toine Bogers; Maurice De Kunder

doi:10.1007/s11192-016-1863-z

Estimating Search Engine Index Size Variability: A 9-Year Longitudinal Study

Antal Van den Bosch, Toine Bogers, Maurice De Kunder

Institut for Kommunikation og Psykologi

Publikation: Bidrag til tidsskrift › Tidsskriftartikel › Forskning › peer review

53 Citationer (Scopus)

549 Downloads (Pure)

Abstract

One of the determining factors of the quality of Web search engines is the size of their index. In addition to its influence on search result quality, the size of the indexed Web can also tell us something about which parts of the WWW are directly accessible to the everyday user. We propose a novel method of estimating the size of a Web search engine’s index by extrapolating from document frequencies of words observed in a large static corpus of Web pages. In addition, we provide a unique longitudinal perspective on the size of Google and Bing’s indices over a nine-year period, from March 2006 until January 2015. We find that index size estimates of these two search engines tend to vary dramatically over time, with Google generally possessing a larger index than Bing. This result raises doubts about the reliability of previous one-off estimates of the size of the indexed Web. We find that much, if not all of this variability can be explained by changes in the indexing and ranking infrastructure of Google and Bing. This casts further doubt on whether Web search engines can be used reliably for cross-sectional webometric studies.

Originalsprog	Engelsk
Tidsskrift	Scientometrics
Vol/bind	107
Udgave nummer	May
Sider (fra-til)	839-856
ISSN	0138-9130
DOI	https://doi.org/10.1007/s11192-016-1863-z
Status	Udgivet - 9 feb. 2016

Adgang til dokumentet

10.1007/s11192-016-1863-z

Van den Bosch et al. (2016).pdfForlagets udgivne version, 688 KB

AUB Link

Søg efter materialet i Aalborg Universitetsbiblioteks søgemaskine

Citationsformater

@article{cf45c9ebb0f54e2685306a41e6f5cc4a,

title = "Estimating Search Engine Index Size Variability: A 9-Year Longitudinal Study",

abstract = "One of the determining factors of the quality of Web search engines is the size of their index. In addition to its influence on search result quality, the size of the indexed Web can also tell us something about which parts of the WWW are directly accessible to the everyday user. We propose a novel method of estimating the size of a Web search engine{\textquoteright}s index by extrapolating from document frequencies of words observed in a large static corpus of Web pages. In addition, we provide a unique longitudinal perspective on the size of Google and Bing{\textquoteright}s indices over a nine-year period, from March 2006 until January 2015. We find that index size estimates of these two search engines tend to vary dramatically over time, with Google generally possessing a larger index than Bing. This result raises doubts about the reliability of previous one-off estimates of the size of the indexed Web. We find that much, if not all of this variability can be explained by changes in the indexing and ranking infrastructure of Google and Bing. This casts further doubt on whether Web search engines can be used reliably for cross-sectional webometric studies.",

keywords = "Search engines, Webometrics, Longitudinal study, Index size estimation",

author = "{Van den Bosch}, Antal and Toine Bogers and {De Kunder}, Maurice",

year = "2016",

month = feb,

day = "9",

doi = "10.1007/s11192-016-1863-z",

language = "English",

volume = "107",

pages = "839--856",

journal = "Scientometrics",

issn = "0138-9130",

publisher = "Akademiai Kiado Rt.",

number = "May",

}

TY - JOUR

T1 - Estimating Search Engine Index Size Variability

T2 - A 9-Year Longitudinal Study

AU - Van den Bosch, Antal

AU - Bogers, Toine

AU - De Kunder, Maurice

PY - 2016/2/9

Y1 - 2016/2/9

N2 - One of the determining factors of the quality of Web search engines is the size of their index. In addition to its influence on search result quality, the size of the indexed Web can also tell us something about which parts of the WWW are directly accessible to the everyday user. We propose a novel method of estimating the size of a Web search engine’s index by extrapolating from document frequencies of words observed in a large static corpus of Web pages. In addition, we provide a unique longitudinal perspective on the size of Google and Bing’s indices over a nine-year period, from March 2006 until January 2015. We find that index size estimates of these two search engines tend to vary dramatically over time, with Google generally possessing a larger index than Bing. This result raises doubts about the reliability of previous one-off estimates of the size of the indexed Web. We find that much, if not all of this variability can be explained by changes in the indexing and ranking infrastructure of Google and Bing. This casts further doubt on whether Web search engines can be used reliably for cross-sectional webometric studies.

AB - One of the determining factors of the quality of Web search engines is the size of their index. In addition to its influence on search result quality, the size of the indexed Web can also tell us something about which parts of the WWW are directly accessible to the everyday user. We propose a novel method of estimating the size of a Web search engine’s index by extrapolating from document frequencies of words observed in a large static corpus of Web pages. In addition, we provide a unique longitudinal perspective on the size of Google and Bing’s indices over a nine-year period, from March 2006 until January 2015. We find that index size estimates of these two search engines tend to vary dramatically over time, with Google generally possessing a larger index than Bing. This result raises doubts about the reliability of previous one-off estimates of the size of the indexed Web. We find that much, if not all of this variability can be explained by changes in the indexing and ranking infrastructure of Google and Bing. This casts further doubt on whether Web search engines can be used reliably for cross-sectional webometric studies.

KW - Search engines

KW - Webometrics

KW - Longitudinal study

KW - Index size estimation

U2 - 10.1007/s11192-016-1863-z

DO - 10.1007/s11192-016-1863-z

M3 - Journal article

SN - 0138-9130

VL - 107

SP - 839

EP - 856

JO - Scientometrics

JF - Scientometrics

IS - May

ER -

Estimating Search Engine Index Size Variability: A 9-Year Longitudinal Study

Abstract

Adgang til dokumentet

AUB Link

Fingeraftryk

Citationsformater