August 15, 2005

NCSA Compares Google and Yahoo Index Numbers

This just up on slashdot: Slashdot | NCSA Compares Google and Yahoo Index Numbers. Yahoo's numbers are a bit suspect, as you can see.


Eric Baillargeon said...

Hum... You should check the methodology of this research ! That's a joke ! On the two first keyphases where Google show results and Yahoo no one, Google simply show results of copies of Ispell Wordlist page for spamming purpose !!!

tdailey said...


I don't see how your comment addresses the methodology. The pages, even if they are spam, are web pages. Yahoo isn't claiming "better quality" pages or "pages without spam." Yahoo is claiming that they index twice as many pages as Google.

So, what's wrong with the methodology? Are the pages you refer to not web pages?

Jean VĂ©ronis said...

You can find a detailed analysis of the study's flaws at:

By the way, the NCSA realized the problem and the page now has a strong disclaimer:

The following study was completed by two of Professor Vernon Burton's students at the University of Illinois. Though one of the students previously worked with Professor Burton at the National Center for Supercomputing Applications (NCSA), the study was done outside the scope of any NCSA core projects. When first published online, staff at the NCSA noted several issues with the study, and some revisions have been made to the document to reflect several of these concerns. Changes are detailed at the bottom of this page.

Please note again that this study is not an NCSA publication and was not conducted as part of any NCSA project or under the supervision of NCSA.

A verification study is currently in progress that addresses the presence of "wordlists" and "dictionaries" in the search results that many rightly point out could count as a source of bias. The new study filters out any dictionary or wordlist results. Preliminary results (from 7000 test queries) indicates that the results of this verification study confirms the conclusions of this study, but final results are still forthcoming.

But this has not yet gone to Slashdot ;-)