News

Pages Blocked by Robots.txt Can Be Indexed if They’re Linked To

SEO October 5, 2024

0 34 1 minute read

John Mueller from Google has cautioned that pages blocked by robots.txt might still end up being indexed if there are links directing to them.

This could pose an issue because Google would perceive these pages as lacking content due to the crawling block.

Mueller suggests using a noindex meta tag if there are parts of your site you wish to remain unseen by Google.

This subject was addressed during a recent Webmaster Central hangout when a site owner inquired if simply using “disallow” for non-essential pages would suffice for indexing purposes.

Mueller’s full response is quoted below:

“One thing maybe to keep in mind here is that if these pages are blocked by robots.txt, then it could theoretically happen that someone randomly links to one of these pages. And if they do that then it could happen that we index this URL without any content because it’s blocked by robots.txt. So we wouldn’t know that you don’t want to have these pages actually indexed.

Whereas if they’re not blocked by robots.txt you can put a noindex meta tag on those pages. And if anyone happens to link to them, and we happen to crawl that link and think ‘maybe there’s something useful here’ then we would know that these pages don’t need to be indexed and we can just skip them from indexing completely.

So, in that regard, if you have anything on these pages that you don’t want to have indexed then don’t disallow them, use noindex instead.”

The full question and answer can be viewed in a video, starting at the 24:36 mark.

For more details on robots.txt blocking and the noindex tag, consider exploring discussions on whether to noindex category and archive pages.