There are two scientific papers or patent that were written about by Bill Slawski and Gary Price, who else. Bill wrote about How does Google Pick Snippets for Your Pages to Show in Search Results? and Gary wrote about How Google Identifies, Detects & Understands Music Files. Both are very interesting concepts and reads. Let's start with Bill's snippets:
Bill points to a new awarded Google patent named Methods and systems for generating textual information and then breaks down that document into english. Here is Bill's "takeaway":
A takeaway from this patent may be that you should pay attention carefully to the text that surrounds and supports phrases on your pages that you think might be terms that people will search for to find those pages, and that may rank well and show up in search results that people will see.
Forum discussion on Snippets at Sphinn.
Gary points to a paper written by Google's NYC office, named Robust Music Identification, Detection, and Analysis (PDF). The paper goes through the various ways Google can identify, detect and understand music files. What is also cool, is that Google also has a method of apparently detecting "duplicate content" when it comes to music files by looking at pitches and tones and comparing them to what else matches those pitches, frequencies, tones, etc that is in the Google Music or Video index. Google calls that piece "Factor Uniqueness Analysis".
Forum discussion on Music detection at Sphinn.