
Did you know that Google uses around 40 different signals to perform canonicalization in Google Search? This not only includes numerous type of redirects, the rel canonical attribute but also things like x-default, sitemaps, links, PageRank and other signals.
Google did not list all the signals but said the most important signals used for canonicalization are redirects, rel-canonical attributes and maybe x-default. There was this signal named "Redirect to Shorter" that Google killed off recently, as well.
This came up in the excellent Search Off The Record interview of Allan Scott from the Google Search team, who works specifically on duplication within Google Search. Martin Splitt and John Mueller from Google interviewed Allan.
Allan said at the 3:45 minute mark into the interview, "I'm not sure what the exact number is right now because it goes up and down, but I suspect it's somewhere in the neighborhood of 40."
John then joked, "Okay. Well, now our listeners will be making spreadsheets with 40 signals like they used to do with those 200 ranking signals that we had."
Here is the video embed when he spoke about this:
On Redirect to Shorter, it was killed off because Allan said, "it had a really bad interaction with HTTP/HTTPS, because if you had conflicting signals come from the webmaster, this one would push you to HTTP."
Two most common conflicting signals are canonical tags vs redirects, and then Google will look at other signals like sitemaps, PageRank, etc, he added.
Here is what Allan said about x-defaults as a signal:
Martin was asking me about canonicalization signals earlier. x-default is actually a signal and not inconsequential one. I don't know that it is used very commonly. It does seem to be used reasonably well when it is used. I kind of wish people would use it a bit more. To put this in perspective. You've kind of got two tools here. One of them is rel="canonical", which says, "Hey, I'm supposed to be clustered with this other page, and that other one is supposed to be canonical." x-default is more of a, "Hey, if you don't know what a locale to do or I wind up in the same cluster as this other page, that's the one you want for retrieval," and that sort of thing. It is a sort of rel="canonical" in a way, but not for clustering, just for canonical selection.
At the end, Allan said he reviewed Google's public documentation on canonicalization recently and it was "fairly authoritative" but maybe was missing details son the x-lang default stuff. He said, "Just to follow up, there is actually a fairly authoritative external list on what webmaster signals we use in canonical selection. I actually looked it over recently and it's still basically up to date. I think the one thing that might be missing from it is x lang default is now kind of important, but the rest of them--like sitemaps, 301, rel="canonical", they're all there."
Forum discussion at X.
 
             
        
