Last Friday we broke the story on Google introducing a new advanced search filter named Reading Levels. In short, Google would let you filter the search results based on how basic, intermediate or advanced the web results are.
Google's Product Manager, Web Search, Nundu explained how Google came up with classifying one web page as basic versus another as advanced. They hired teachers to grade web pages!
Nundu said:
The feature is based primarily on statistical models we built with the help of teachers. We paid teachers to classify pages for different reading levels, and then took their classifications to build a statistical model. With this model, we can compare the words on any webpage with the words in the model to classify reading levels. We also use data from Google Scholar, since most of the articles in Scholar are advanced.
Yes, so they hired teachers to sit by a computer and read then score the web pages based on level of reading. Google then used all this data to come up with a statistical model to apply the grades (or reading levels) more widely to all web documents.
Google admitted to also uses Google Scholar documents as a benchmark for "advanced" reading level.
I found it pretty cool that Google shared this information.
Forum discussion at Google Web Search Help.