Friday, Google posted a blog post named Demystifying the "duplicate content penalty". The blog post basically describes what many of us have been saying for a long time. Google does not typically penalize for duplicate content, but duplicate content can cause issues for certain pages to rank well. Non-malicious duplicate content is when you have pages that are very similar to each other, like filtered down search results or product option pages. If Google detects duplicate content they:
- When we detect duplicate content, such as through variations caused by URL parameters, we group the duplicate URLs into one cluster.
- We select what we think is the "best" URL to represent the cluster in search results.
- We then consolidate properties of the URLs in the cluster, such as link popularity, to the representative URL.
So Google is determining for you, what is the "best" URL for a search query. If your pages get clustered together, it may cause an issue for the true "best" URL to come forth. Also, Google will consolidate the link popularity of those URLs but may miss some and then that can hurt a bit.
Some SEOs are arguing about the definition of a Google penalty in the threads, but I won't get into that.
Forum discussion at Sphinn and WebmasterWorld.