Almost two years ago, Rand Fishkin from Moz posted about looking into building a spam detection tool as a way for sites to figure out (1) who not to get links from, (2) remove bad links from and (3) see how Google may determine if a site is spammy and why.
The industry was torn, thinking this may be just Rand's way of doing automated "outing" but the truth is, a tool like this can be useful to link forensics (I used that word) SEOs.
Rand announced the new paid tool on the Moz blog yesterday. Here is a quick video of how it works:
In short, it looks at just 17 different factors and if a site is flagged with some or many, it will score you as more and more spammy as more flags get hit.
Here are the flags Moz uses:
- Low mozTrust to mozRank ratio: Sites with low mozTrust compared to mozRank are likely to be spam.
- Large site with few links: Large sites with many pages tend to also have many links and large sites without a corresponding large number of links are likely to be spam.
- Site link diversity is low: If a large percentage of links to a site are from a few domains it is likely to be spam.
- Ratio of followed to nofollowed subdomains/domains (two separate flags): Sites with a large number of followed links relative to nofollowed are likely to be spam.
- Small proportion of branded links (anchor text): Organically occurring links tend to contain a disproportionate amount of banded keywords. If a site does not have a lot of branded anchor text, it's a signal the links are not organic.
- Thin content: If a site has a relatively small ratio of content to navigation chrome it's likely to be spam.
- Site mark-up is abnormally small: Non-spam sites tend to invest in rich user experiences with CSS, Javascript and extensive mark-up. Accordingly, a large ratio of text to mark-up is a spam signal.
- Large number of external links: A site with a large number of external links may look spammy.
- Low number of internal links: Real sites tend to link heavily to themselves via internal navigation and a relative lack of internal links is a spam signal.
- Anchor text-heavy page: Sites with a lot of anchor text are more likely to be spam then those with more content and less links.
- External links in navigation: Spam sites may hide external links in the sidebar or footer.
- No contact info: Real sites prominently display their social and other contact information.
- Low number of pages found: A site with only one or a few pages is more likely to be spam than one with many pages.
- TLD correlated with spam domains: Certain TLDs are more spammy than others (e.g. pw).
- Domain name length: A long subdomain name like "bycheapviagra.freeshipping.onlinepharmacy.com" may indicate keyword stuffing.
- Domain name contains numerals: domain names with numerals may be automatically generated and therefore spam.
What do you think? I have yet to play directly with the tool.
Forum discussion at Twitter.
Image credit to BigStockPhoto for taekwondo fighter