John Mueller from Google gave one of the clearest and easiest to understand explanations on how Google uses machine learning in web search. He basically said Google uses it for "specific problems" where automation and machine learning can help improve the outcome. The example he gave was with canonicalization and the example clears things up.
This is from the Google webmaster hangout starting at 37:47 mark. The example is this "So for example we we use machine learning for canonicalization. So what that kind of means is we have all of those factors that we talked about before. And we give them individual weights. That's kind of the traditional way to do it. And we say well rel canonical has this much weight and redirect has this much weight and internal linking has this much weight. And the traditional approach would be to say well we will just make up those weights, at those numbers and see if it works out. And if we see that things don't work out we will tweak those numbers a little bit. And with machine learning what we can essentially do is say well this is the outcome that we want to have achieved and machine learning algorithms should figure out these weights on their own."
This was the first part of the answer around how Google debugs its search algorithm.
Here is the full transcript of this part.
The question:
Machine learning has been a part of Google search algorithm and I can imagine it's getting smarter every day. Do you as an employee with access to the secret files know the exact reason why pages rank better than others or is the algorithm now making decisions and evolving in a way that makes it impossible for humans to understand?
John's full answer:
We get this question every now and then and we're not allowed to could provide an answer because the machines are telling us not to talk about this topic. So it's I really can't answer. No just kidding.It's something where we use machine learning in lots of ways to help us understand things better. But machine learning isn't just this one black box that does everything for you. Like you feed the internet in on one side the other side come out search results. It's a tool for us. It's essentially a way of testing things out a lot faster and trying things out figuring out what what the right solution there is.
So for example we we use machine learning for canonicalization. So what that kind of means is we have all of those factors that we talked about before. And we give them individual weights. That's kind of the traditional way to do it. And we say well rel canonical has this much weight and redirect has this much weight and internal linking has this much weight. And the traditional approach would be to say well we will just make up those weights, at those numbers and see if it works out. And if we see that things don't work out we will tweak those numbers a little bit. And with machine learning what we can essentially do is say well this is the outcome that we want to have achieved and machine learning algorithms should figure out these weights on their own.
So it's not so much that machine learning does everything with canonicalization on its own but rather it has this well-defined problem. It's working out like what are these numbers that we should have there as weights and kind of repeatedly trying to relearn that system and understanding like on the web this is how people do it and this is where things go wrong and that's why we should choose these numbers.
So when it comes to debugging that. We still have those numbers, we still have those weights there. It's just that they're determined by machine learning algorithms. And if we see that things go wrong then we need to find a way like how could we tell the machine learning algorithm actually in this case we should have taken into account, I don't know phone numbers on a page more rather than just the pure content, to kind of separate like local versions for example. And that's something that we can do when we kind of train these algorithms.
So with with all of these machine learning things it's not that there's one black box and it just does everything and nobody knows why it does things. But rather we try to apply it to specific problems where it makes sense to automate things a little bit in a way that saves us time and that helps to pull out patterns that maybe we wouldn't have recognized manually if we looked at it.
Here is the video embed:
Here is how Glenn Gabe summed it up on Twitter:
More from @johnmu: Machine learning helps us pull out patterns we might have missed. And for debugging, Google can see those weights which are determined by ML algos. If there is something that needs to be improved, Google can work to train the algorithms: https://t.co/J6rDeA68KP pic.twitter.com/Su2pqPKYww
— Glenn Gabe (@glenngabe) December 16, 2019
Forum discussion at Twitter.