Top SEOs Analyze Glorified Scraper Sites After May Day

Jul 13, 2010 - 8:44 am 9 by

WebmasterWorld's administrator, Tedster, posted a thread at WebmasterWorld that takes a deeper look at the May Day update by looking at sites that should be impacted by the update and were not.

Tedster does something you rarely see at a WebmasterWorld thread and picks apart a specific site that is doing well. Then you have some really well-known and respected SEOs come in and discuss why those sites are doing well in the May Day update and others are not.

He posted, in part:

When Mayday first dropped on us, there was a sudden INCREASE in rankings for mash-up sites. You can see examples of what I'm talking about at daymix.com, leapfish.com and picdigger.com and Alexa shows their increases in traffic.

These are often sites with some substantial financing, and even relatively famous owners or CEOs. But to my view, they are a plague on the web and in no way offer the "better long tail results" Google was aiming for.

As one example - do a Google search for site:daymix.com webmasterworld - I currently see 297 pages built from bits and pieces of our content. Try it for other domains and you often see much the same thing.

The goal, figure out why these sites are doing well in Google and replicate it so your scraper can do well also.

Here are some, not all, of the responses on what some top SEOs feel is working for these sites:

The site in question does do quite a bit of linking out to other sites that provide additional information within the mashed up content. Clicking the link goes to the site the content was ripped from. All the links are nofollowed so I wonder is this something we need to take another look at.

Adding more outgoing links to provide more information on the subject/product the page is about.

Daymix doesn't just scrape the Google serps pages and lift the titles and descriptions of the highest ranking/most relevant pages for a query, though, the way scrapers used to. It emulates Universal and scrapes the highest ranking/most relevant sources for different types of data that make up a Google serps page.

Daymix displays a mixture of web, news, blogs, images, videos, Twitter content, etc... and it's good enough, eg, to know when Twitter content might be appropriate for a query and when not; and what the most authoritative sites in a given field are to scrape. Apparently, the vocabulary and media mix is attractive to Google.

He then says this is similar to Google Place Pages.

I took a look at the daymmix site, and one thing I noticed is that when I looked at a result that specifically brought up my site, I have a script that displays the user agent of the visitor, and that the user agent is listed as googlebot. So I can confirm that they are scraping Google SERPS, or they are changing their robots names, and not obeying robots.txt.

Finally, Aaron Wall:

They fund a lot of the duplication...and they need to focus more on ways to promote / subsidize the cost of quality & encourage it. Minimizing the role of the scrape and mash game would be a big step in that direction.

But if end users don't know the difference (and don't understand the business connections) does it harm Google to make the media ecosystem weaker and more desperate for negotiations? I see it as the strategy of funding a third party to make a future partner weaker so you have more leverage at the bargaining table. But some might claim that is a cynical way of looking at it :D

This discussion and thread is just going to get better, so keep an eye on it.

Forum discussion at WebmasterWorld.

Update: I didn't realize that this thread was private. Now that I posted the quotes, removing them doesn't make sense (it is out there in the feeds already). Trust me, there is a ton more discussion in the thread. I only pulled out excerpts from the thread - so this is one reason to become a paid member of WebmasterWorld.

 

Popular Categories

The Pulse of the search community

Search Video Recaps

 
- YouTube
Video Details More Videos Subscribe to Videos

Most Recent Articles

Google Search Engine Optimization

Video On Google Exploit With End Points Reveal Interesting Ranking Signals

Dec 18, 2024 - 7:51 am
Google Search Engine Optimization

Forbes Fires Freelancers Over Google's Site Reputation Abuse Policy

Dec 18, 2024 - 7:41 am
Google

Google Search Tests Rich Things To Do Image Carousel

Dec 18, 2024 - 7:31 am
Google

Google Search Shadow On Hover Of Search Results

Dec 18, 2024 - 7:21 am
Google Ads

Google Ads Tests Double Serving Ads From Same Advertiser On Same Page

Dec 18, 2024 - 7:11 am
Search Forum Recap

Daily Search Forum Recap: December 17, 2024

Dec 17, 2024 - 10:00 am
Previous Story: Google Not Sending Referrer Data Again?