Well it seems patent documents seem to be raining down on us from all directions. Barry posted on this news this morning, but I want to expand on it a bit more for a more thorough look at it. The link was found via Cre8asite forums from the talented Bragadocchio that highlights a Yahoo Patent that details how Yahoo plans to give searchers an informal editoral position to help shape the results. They want to know how you search, the nitty gritty of your habits more or less. To make a better Yahoo search engine I think we should probably consider it. However if you thought the Google patent was hard to read, the language in the Yahoo patent is way worse, and reads like a good VCR manual from China. It introduces us to such terms and concepts as superunits, a constituent units, concept network, signatures. There is an underlying message in the Patent that may not be picked up at first but overwhelmingly there is a lot about Local Search in the document and is interesting how Yahoo is approaching a lot of this. Examples are related solely to location "new york, paris, hawaii" from just a few of the examples. It definately not restricted to local but seems to have a big part in powering it.
Here is the following abstract from the document:
A concept network is generated from a set of queries by parsing the queries into units and defining various relationships between the units, e.g., based on patterns of units that appear together in queries. From the concept network, various similarities between different units can be detected, and units that have some identifying characteristic(s) in common may be grouped into superunits. For each superunit, there is a corresponding signature that defines the identifying characteristic(s) of the group. A query can be processed by identifying constituent units, determining the superunit membership of some or all of the constituent units, and using that information to formulate a response to the query
Here is a better illustration applied to the abstract:
What human beings think in terms of are natural concepts. For example, "hawaii" and "new york city" are vastly different queries in terms of length as measured by number of words but for a human being they share one important characteristic: they are each made up of one concept. In contrast, a person regards the query "new york city law enforcement" as fundamentally different because it is made up of two distinct concepts: "new york city" and "law enforcement. Human beings also think in terms of logical relationships between concepts. For example, "law enforcement" and "police" are related concepts since the police are an important agency of law enforcement; a user who types in one of these concepts may be interested in sites related to the other concept even if those sites do not contain the particular word or phrase the user happened to type. As a result of such thinking patterns, human beings by nature build queries by entering one or more natural concepts, not simply a variably long sequence of single words, and the query generally does not include all of the related concepts that the user might be aware of.
That last part is quite good in explaining some of the goal of what Yahoo is accomplishing, but just a little part the Patent is extremely thorough and will additionally take more time to read it all the way through.
The document also mentions co-occurrence as stated below:
To establish an association between units, a minimum frequency of co-occurrence may be required.
Which is roughly translates that Yahoo in order to build a thesaurus of terms needs to establish a minimum for which words are semantically connected.
Here is an example of some of the in use with the following example:
[0069] For example, consider a case where users search for information about their favorite musical performers. Typically, these users would construct a query that includes the name of the performer (e.g., "Avril Lavigne" or "Celine Dion" or "Matchbox Twenty") and also some other words reflecting the type of information sought, such as "lyrics", "mp3", "guitar tabs", "discography", and so on; these other words are neighbor units that would tend to appear with names of different performers. Based on the occurrence of similar neighbor units, superunit seed module 412 groups the performer names into a cluster.
There is whole lot more contained in the patent obviously. It also talks about a Content Analysis System, Concept Network Builder, Superunit Seed Module (?),
Possible applications for use of Superunits:
1. Resolving Ambiguity 2. Suggesting Related Searches 3. Suggesting "Sideways" Searches 4. Resolving Spelling Errors 5. Supporting Directory-Based Searching 6. Advertisements
Check out the Yahoo Patent on Systems and methods for search processing using superunits