Over the past several weeks and months it seems more and more site owners and SEOs are complaining about Google not indexing their web sites as it once did. I asked Google's John Mueller if anything was new here and he said no, Google has not changed how it indexes sites in terms of quality or the depth; but what may have changed is how Google Search Console reports it more accurately.
I asked John Mueller about this in last week's Google hangout, towards the beginning:
Barry: Could we start with some indexing. It seems like a lot of people are complaining about indexing issues for the past couple weeks or months. This is a new thing where Google's like you know what we're not gonna go ahead and index as many pages as we used to because we're gonna look at quality and say all right if the site's not a certain quality threshold we're not gonna index as many pages? Is there anything to do with that?
John responded that there isn't anything new here, he said:
John: I don't think there's anything really new in that regard. But what has kind of changed over I guess the last half year or even longer is that we showed this information a little bit more visibly in Search Console.
So it's not so much that it's like we're changing how we do indexing, how we select the pages that we use for indexing or crawling. It's more that in Search Console will tell you we've seen these URLs but we decided not to index them.
And then people take that information they're like oh I must be doing something wrong therefore I need to fix something. Or like maybe Google is broken or something like that because it's it's flagging me all of these pages.
But it's something that I think at least as far as I know has been the case at Google since the beginning and that we just can't index the whole web. There are so many things happening across the whole web that we we can't possibly keep up with everything all the time. So we have to make decisions along the way, we figure out which pages we really need to have crawled and indexed and maybe which pages it's like well it's interesting that you wrote this but at the moment we we don't really know what we should do with this information so maybe we should just not index it at the moment.
I think that's it's kind of tricky because on the one hand when when I talk to Search Console, the Search Console folks we tell them that this confuses people and they feel they need to fix something because we're flagging it as something not being indexed. But at the same time it's also the kind of information we want to give people, where we essentially want to tell them hey we we notice you put this stuff out what we decided not to index it. We don't really have a kind of a human readable reason for why we decided not to index it. But we we still want to give people the information that we we understand your site is this big but we're just picking up a small part for indexing at the moment.
I went ahead and double checked to be clear:
Barry: Okay so just to be clear. It's a reporting thing possibly that has changed in the past several months but not anything new with how Google determines what to index or anything like that?
He then responded:
John: Yeah, yeah, I mean it's something that that happens with crawling all the time essentially. You see it as an SEO if you're looking at your server logs you notice it they're probably first. Where you realize you put out a hundred thousand pages and Google is crawling twenty thousand of them. And essentially we might know about these hundred thousand pages because they're in the sitemap file but we're just crawling a small part of them. So therefore we can only really index that small part of your site. And as you kind of improve improve the quality of your website, as you improve things around your website, maybe improve some technical details, then that number of the URLs that we crawl that tends to go up.
And that's from from my point of view that's been the case since I don't know since the beginning. I think in in Gary's crawl budget blog post he also touched upon this, kind of the the crawl demand, around how much we want to crawl from your website. And that that also changes over time and if we don't want to crawl a lot of stuff from your website then essentially we're not going to be able to index all of that.
Here is the video embed:
So can it be a Google Search Console thing? In mid-December, Google updated the index coverage report and then again in August 2018.
What do you all think?
Forum discussion at YouTube Community.