News

Google’s Gary Illyes Issues Ongoing Warnings About URL Parameter Problems

SEO September 29, 2024

0 25 3 minutes read

Google’s Gary Illyes recently highlighted a recurring SEO problem on LinkedIn, echoing concerns he’d previously voiced on a podcast.

The issue? URL parameters cause search engines difficulties when they’re crawling websites.

This problem is especially challenging for big sites and online stores. When different parameters are added to a URL, it can result in numerous unique web addresses that all lead to the same content.

This can impede search engines, reducing their efficiency in crawling and indexing sites properly.

The URL Parameter Conundrum

In both the podcast and LinkedIn post, Illyes explains that URLs can accommodate infinite parameters, each creating a distinct URL even if they all point to the same content.

He writes:

“An interesting quirk of URLs is that you can add an infinite (I call BS) number of URL parameters to the URL path, and by that essentially forming new resources. The new URLs don’t have to map to different content on the server even, each new URL might just serve the same content as the parameter-less URL, yet they’re all distinct URLs. A good example for this is the cache busting URL parameter on JavaScript references: it doesn’t change the content, but it will force caches to refresh.”

He provided an example of how a simple URL like “/path/file” can expand to “/path/file?param1=a” and “/path/file?param1=a&param2=b“, all potentially serving identical content.

“Each [is] a different URL, all the same content,” Illyes noted.

Related: Google Warns: URL Parameters Create Crawl Issues

Accidental URL Expansion & Its Consequences

Search engines can sometimes find and try to crawl non-existent pages on your site, which Illyes calls “fake URLs.”

These can pop up due to things like poorly coded relative links. What starts as a normal-sized site with around 1,000 pages could balloon to a million phantom URLs.

This explosion of fake pages can cause serious problems. Search engine crawlers might hit your servers hard, trying to crawl all these non-existent pages.

This can overwhelm your server resources and potentially crash your site. Plus, it wastes the search engine’s crawl budget on useless pages instead of your content.

In the end, your pages might not get crawled and indexed properly, which could hurt your search rankings.

Illyes states:

“Sometimes you might create these new fake URLs accidentally, exploding your URL space from a balmy 1000 URLs to a scorching 1 million, exciting crawlers that in turn hammer your servers unexpectedly, melting pipes and whistles left and right. Bad relative links are one relatively common cause. But robotstxt is your friend in this case.”

E-commerce Sites Most Affected

The LinkedIn post didn’t specifically call out online stores, but the podcast discussion clarified that this issue is a big deal for ecommerce platforms.

These websites typically use URL parameters to handle product tracking, filtering, and sorting.

As a result, you might see several different URLs pointing to the same product page, with each URL variant representing color choices, size options, or where the customer came from.

Mitigating The Issue

Illyes consistently recommends using robots.txt to tackle this issue.

On the podcast, Illyes highlighted possible fixes, such as:

Creating systems to spot duplicate URLs
Better ways for site owners to tell search engines about their URL structure
Using robots.txt in smarter ways to guide search engine bots

Here’s how blocking parameters via robots.txt helped a website cut down on crawling tens of thousands of URLs, as Google attempted to crawl with nonsensical parameter values, resulting in 404 pages.

The Deprecated URL Parameters Tool

In the podcast discussion, Illyes touched on Google’s past attempts to address this issue, including the now-deprecated URL Parameters tool in Search Console.

This tool allowed websites to indicate which parameters were important and which could be ignored.

When asked on LinkedIn about potentially bringing back this tool, Illyes was skeptical about its practical effectiveness.

He stated, “In theory yes. in practice no,” explaining that the tool suffered from the same issues as robots.txt, namely that “people couldn’t for their dear life figure out how to manage their own parameters.”

Implications for SEO and Web Development

This ongoing discussion from Google has several implications for SEO and web development:

Crawl Budget: For large sites, managing URL parameters can help conserve crawl budget, ensuring that important pages are crawled and indexed.
Site Architecture: Developers may need to reconsider how they structure URLs, particularly for large e-commerce sites with numerous product variations.
Faceted Navigation: E-commerce sites using faceted navigation should be mindful of how this impacts URL structure and crawlability.
Canonical Tags: Canonical tags help Google understand which URL version should be considered primary.