Google Doesn't Technically Follow Links, It Extracts, Collects & Checks Later

Aug 16, 2024 - 7:31 am 0 by

Google Link Jar

Google's Gary Illyes clarified on the Search Off The Record podcast that Google technically does not follow links. Instead, Google will extract the links, collect them in a database, and then check them later. Of course, most of you know this already and it doesn't really matter much for SEO to know the difference but hey.

Gary Illyes from Google said at the 25:26 mark in that podcast:

Well, yeah, it's my pet peeve. On onesie [Google Search Central Site], we keep saying Googlebot is following links, like, no, it's not following links. It's collecting links, and then it goes back to those links. It's not like properly following links. The picture that we are painting is that Googlebot is like hopping from--

Gary then did a bit of a post on this on LinkedIn, explaining more. "You probably heard it before that Googlebot "follows" links. It doesn't. But it's a pretty illustrative way to describe what Googlebot does," he said.

He wrote:

A recent Search Off the Record episode (https://lnkd.in/eG566yve) caused some ruckus because we apparently "leaked" that Googlebot doesn't just "follow" links it finds in a page it just downloaded. If you ever spent some time analyzing your server's access logs in the past, say, 15 years, you already knew that that's not the case. There's more involved than just blindly making a request to URLs found in a elements; there's deduplication across protocol variants, there's prioritization of URLs, there's coffee or lack of, thereof.

So why "follow" then? As much as I don't like it, it is a very simple way to explain what Googlebot actually does. There's value in using simple analogies (similes?), but there's also a place for going for more indepth explanations. You choose the one that you think will work for the audience you're talking to at the time.

Here is the embed to listen to it:

Gary also added in a comment deep inside LinkedIn over here in a different language, "btw, we have another link extraction system in the indexing process (for fancy/stupid links)."

There is also this question from Kristine Schachinger who asked, "I am confused. I know that Google can trip dynamic sites to "create pages" from internal links, which I assumed only happens on crawl, so how does that happen in this scenario?" Gary responded saying "I don't think there's a relation between the two things. Crawlers see a link and eventually they go back to that link (and if they don't, at least in Googlebot's case, you end up with "Discovered, not crawled", or whatever Search Console reports). If they go back, the new page is dynamically created. The thing we've used to do with wget to recursively download stuff in ~realtime doesn't exist with modern crawlers."

So Google does link extraction in many ways and it does not immediately follow those links that it extracts.

Forum discussion at LinkedIn.

 

Popular Categories

The Pulse of the search community

Follow

Search Video Recaps

 
- YouTube
Video Details More Videos Subscribe to Videos

Most Recent Articles

Search Forum Recap

Daily Search Forum Recap: December 10, 2024

Dec 10, 2024 - 10:00 am
Google

Google Crawler Documention Adds HTTP Caching

Dec 10, 2024 - 7:51 am
Google Search Engine Optimization

Google: Sometimes Over Optimization Drift Towards SEO Spam

Dec 10, 2024 - 7:41 am
Google Search Engine Optimization

Google Search Console Insights Removes Google Analytics Data

Dec 10, 2024 - 7:33 am
Bing SEO

Copilot Beta Now Bing Webmaster Tools For 10,000 Users

Dec 10, 2024 - 7:31 am
Google

Google Tests Trending & Popular With Labels In Search Results

Dec 10, 2024 - 7:21 am
Previous Story: FTC Banned Fake Reviews