Getting Rid of Duplicate Content Issues Once and For All

Nov 13, 2008 - 1:51 pm 0 by
Filed Under PubCon 2008

Moderator: Rand Fishkin

Rahul Lahiri, Vice President of Search Product Management, Ask is a maybe...

Ben D'Angelo, Software Engineer, Google is kicking things off.

Duplicate content issues include multiple URLs pointing to the same page or very similar pages. Different countries with the same language. Duplicate content is also across other sites as syndicated content and scraped content.

The ideal situation is you want one URL for one piece of content.

Examples of duplicates include www vs no www, session IDs, URL parameters, print version pages, CNAMEs. Then you have similar content on different URLs. Using manufacturers database of pictures and content. Sites in different countries with same language.

How does Google handle duplicate content? General idea is that they cluster pages together and choose the "best" representation page. They have different types of filters for different types of duplicate content. This is not a penalty, just a filter.

What can you do about this? - For exact dups use a 301 redirect - Near duplicates noindex and robots.txt them out - Domains by country, note a different language is not duplicate, use unique content specific to country and use different TLDs and webmaster tool's geo thing. - Try not to put extraneous parameters in your URLs

There are also things like duplicate meta tags and titles.

What about other sites that cause duplicate content. What if you syndicate your content out. One tip, make sure to include a link back to the original article or content. Maybe also just give them a summary. If you syndicate other's content then flip the reverse.

Scrapers are likely not to impact you, it is possible, but rare. You can then file a DMCA and/or Spam Report.

Priyank Garg, Director Product Management, Yahoo! Search is going short with his presentation cause he lost his voice.

Yahoo does filter dups throughout all steps in the pipeline. He shows some examples... They classify most duplicate content "accidental." Soft 404 (not real 404s) is one of the largest source of duplicates. There are also abusive forms, like scrapers.

He then links to Yahoo Tools, like Site Explorer. The dynamic URL rewrite tool rocks, so does URL removal.

Derrick Wheeler, Senior Search Engine Optmization Architect, Microsoft is last up.

Duplicate content is his worse nightmare. CIRTA = crawl, index, rank, traffic, action. They have 180 million URLs in Live Search, 80 million in Google and a few in Yahoo, cause each engine filters them out differently.

- Consider you might need to detect when an engine is coming to your site, like cloak - in very specific considerations it is helpful, like session IDs - Know your parameters - Always link to your parameters in the same order - Dig into the search results of your site and you can find things there - Exclude dups using robots.txt or noindex, nofollow, etc. - Don't assume engines cant find JavaScript - Find a tool that will crawl your site, so you can see how an engine will look at your site - Focus on your strong URLs first

These are his key points, heading out now, have a meeting.

 

Popular Categories

The Pulse of the search community

Search Video Recaps

 
- YouTube
Video Details More Videos Subscribe to Videos

Most Recent Articles

Google Search Engine Optimization

Video On Google Exploit With End Points Reveal Interesting Ranking Signals

Dec 18, 2024 - 7:51 am
Google Search Engine Optimization

Forbes Fires Freelancers Over Google's Site Reputation Abuse Policy

Dec 18, 2024 - 7:41 am
Google

Google Search Tests Rich Things To Do Image Carousel

Dec 18, 2024 - 7:31 am
Google

Google Search Shadow On Hover Of Search Results

Dec 18, 2024 - 7:21 am
Google Ads

Google Ads Tests Double Serving Ads From Same Advertiser On Same Page

Dec 18, 2024 - 7:11 am
Search Forum Recap

Daily Search Forum Recap: December 17, 2024

Dec 17, 2024 - 10:00 am
Previous Story: Keynote Address by Satya Nadella of Microsoft Live Search