Several weeks ago, Google told us crawl budget is not what we think it is and promised to write a blog post explaining what Google defines as crawl budget. Well, Gary Illyes posted on the Google blog What Crawl Budget Means for Googlebot.
In short, "crawl rate and crawl demand together," Google said is how they "define crawl budget as the number of URLs Googlebot can and wants to crawl."
Crawl rate limit is how much GoogleBot allows itself to crawl your pages in order not to overload your server and do harm. It is the "number of simultaneous parallel connections Googlebot may use to crawl the site, as well as the time it has to wait between the fetches." Google won't crawl too much to take down your server and it will respect the limit you set in the Google Search Console.
Crawl demand is the demand for Google to crawl your pages, new and old pages. Demand is determined by popularity of your site and URLs and the staleness of the Google systems attempt to prevent URLs from becoming stale in the index. Even if you have more crawl rate left, if demand doesn't require more crawl, GoogleBot won't crawl.
Google said, these two make up crawl budget. "Taking crawl rate and crawl demand together we define crawl budget as the number of URLs Googlebot can and wants to crawl," Google wrote.
What impacts crawl budget?
- Faceted navigation and session identifiers
- On-site duplicate content
- Soft error pages
- Hacked pages
- Infinite spaces and proxies
- Low quality and spam content
Crawl rate is not a ranking factor, said Google. Google wrote, "an increased crawl rate will not necessarily lead to better positions in Search results. Google uses hundreds of signals to rank the results, and while crawling is necessary for being in the results, it's not a ranking signal."
For more details and a lot more FAQs in the blog post.