Google's John Mueller said in an SEO hangout last Friday that it is impossible for Google to understand that one piece of content is equivalent to another piece of content when those content pieces are in different languages. So Google is basically trusting the hreflang attribute provided by publishers.
Google's John Mueller was asked how does Google measure the similarity of pages at the 26:28 mark into this video. John said "we don't." Google just uses the "hreflang to understand which of these URLs are equivalent from your point of view and we will swap those out," John said. John added "for hreflang, I think it's impossible for us to understand that this specific content is equivalent for another country or another language." John basically said Google cannot understand this.
Here is the transcript:
AUDIENCE: OK, so how does Google measure the similarity of pages?
JOHN MUELLER: I think we don't. I think we basically use the hreflang to understand which of these URLs are equivalent from your point of view. And we will swap those out.
AUDIENCE: Oh, OK, so not from the content point of view, maybe some--
JOHN MUELLER: No.
AUDIENCE: -- similar content.
JOHN MUELLER: No, I -- we would only do that for things like the rel canonical to understand what the canonical URL is. But for hreflang, I think it's impossible for us to understand that this specific content is equivalent for another country or another language. Like, there are so many local differences that are always possible.
Here is how Glenn Gabe summed it up:
More from @johnmu: But Google does try to understand the similarities between content for *canonicalization* (but that would be for the same language). For hreflang, that's not happening when the pages are in another language: https://t.co/dB613YXJae pic.twitter.com/QLQHRG7mm0
— Glenn Gabe (@glenngabe) January 18, 2022
Here is the video embed:
Super interesting, which leads us to my next article which will be live in 10 minutes on why doesn't Google use MUM for this understanding?
Forum discussion at Twitter.