Did you know that Google Search checks about four billion host names each and every day for robots.txt purposes? Gary Illyes said in the December Search Off The Record podcast "we have about four billion host names that we check every single day for robots.txt."
He said this at the 20:31 mark in the video. He said if they check four billion host names daily, then "the number of sites is probably over or very likely over four billion."
I spotted this video Glenn Gabe:
Google's Gary Illyes in the latest SOTR Podcast: Google has about 4 billion hostnames that it checks every single day for robots.txt https://t.co/Irc2outOM4 pic.twitter.com/lyb68pnR7d
— Glenn Gabe (@glenngabe) December 22, 2023
Here is the transcript:
GARY ILLYES: Yeah, and I mean, that's one of the problems that we brought up early on. If we implement something or if we come up or suggest something that could work, that should not put more strain on publishers because if you think about it, if you go through our robots.txt cache, you can see that we have about four billion host names that we check every single day for robots.txt. Now, let's say that all of those have subdirectories, for example. So the number of sites is probably over or very likely over four billion.
JOHN MUELLER: How many of those are in Search Console? I wonder.
GARY ILLYES: John, stop it.
JOHN MUELLER: I'm sorry.
GARY ILLYES: Anyway, so if you have four billion hostnames plus a bunch more in subdirectories, then how do you implement something that will not make them go bankrupt when they want to implement some opt out mechanism?
JOHN MUELLER: It's complicated.
GARY ILLYES: It's complicated. And I know that people are frustrated that we don't have something already. But it's not something to--
MARTIN SPLITT: Be taken lightly, yeah.
GARY ILLYES: Yeah.
Here is the video embed at the start time:
Forum discussion at X.