With all this talk about clustering and canonicalization today, what about if you get it wrong, or your CMS breaks and serves the wrong thing to Google? We know how consistency and communicating that to Google is super important. Well, Google tries to handle canonicals that seem broken or flat-out wrong.
This came up in the excellent Search Off The Record interview of Allan Scott from the Google Search team, who works specifically on duplication within Google Search. Martin Splitt and John Mueller from Google interviewed Allan.
At the 27:57 minute mark Allan spoke about some edge cases and how Google deals with them. In short, Google does try to deal with broken or invalid canonicalization signals. "We have some validation in place to try to break rel="canonical" when we think they're wrong," Allan said.
Here is what he said:
When I see people who put junk into the rel="canonical" field. Sometimes it's a script gone wrong and you can see that, "Oh there was supposed to be some sort of variable evaluation that didn't happen." You see like $variable name or something, and then all the rel="canonical" on the site are suddenly pointing to hostname/variable. Or, in another case, I've seen people just leave the field empty and that has a meaning.So I think the parser actually turns it into just a forward slash.
It should be a relative path. But I think it actually goes down to the root of the server. It's basically the same as saying, "Please wipe my site out, okay?" You have to be really careful. I should be clear here. We have some validation in place to try to break rel="canonical" when we think they're wrong. But this is another iceberg. Like, we have a very old feature that is essentially being leaned on to do this, and the new feature that we would like to use to do it has been in development for years at this point. Are we ever going to have good rel="canonical" validation? I don't know, but in the meantime, the one we've got is imperfect and, if you make mistakes, will catch some of them and will let some of them through.
I think that it all from our canonical day at this site thanks to Google's interview of Allan Scott.
Here is the video:
Forum discussion at X.