News

Googlebot Crawl Budget Explained by Google’s Gary Illyes

“Crawl budget” is a term frequently used by SEOs without a clear, universally accepted definition. Even Google doesn’t have a single term to define it.

It’s a complex term involving multiple components. Gary Illyes, a Google Webmaster Trends Analyst, has provided a detailed explanation of what crawl budget is and its significance for Googlebot.

Here are the key points from Illyes’ article:

Crawl Budget Explained

Crawl Rate Limit
When Googlebot crawls a site, there’s a set number of simultaneous connections it can make and a specific amount of time it must wait between fetches. This is known as the “crawl rate limit,” and each site’s limit is unique.

Crawl rate limit is determined by two factors. The first is crawl health. If the site responds quickly, Googlebot can use more connections. If the site slows down due to excessive crawling, Googlebot will reduce the number of connections to avoid degrading the user experience.

The second factor is Search Console, which allows site owners to manually set a crawl rate limit within the Site Settings section.

Crawl Demand
Crawl rate limit is insignificant if there’s no indexing demand. Low demand results in low activity from Googlebot. Crawl demand is influenced by popularity and staleness. Google aims to keep popular content fresh in its index while preventing older content from becoming stale.

Crawl demand can also be affected by site-wide events like site moves, which increase demand as Googlebot needs to reindex new URLs.

The combination of crawl rate and crawl demand offers a clearer definition of crawl budget, which Illyes describes as “the number of URLs Googlebot can and wants to crawl.”

Factors Affecting Crawl Budget

To maintain an optimal crawl budget, Illyes advises against wasting resources on low-value-add URLs, which can divert crawl activity from high-quality content.

Illyes identifies the following as low-value-add URLs:

  • Faceted navigation and session identifiers
  • On-site duplicate content
  • Soft error pages
  • Hacked pages
  • Infinite spaces and proxies
  • Low quality and spam content

Other Notes About Crawl Budget

  • The faster the site, the higher the crawl rate.
  • Monitor the Crawl Errors report in Search Console and minimize server errors.
  • Crawling is not a ranking factor.
  • Alternate URLs, AMP URLs, embedded content, and long redirect chains negatively impact crawl budget.
  • Pages marked as nofollow can still be crawled and therefore do not affect crawl budget.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button