Google has quietly introduced a new bot in their crawler documentation that operates on behalf of commercial clients using their Vertex AI product. According to the documentation, this new crawler may crawl websites at the site owner’s request.
Vertex AI Agents
The new crawler, named Google-CloudVertexBot, gathers website content for Vertex AI clients. This is different from other bots listed in the Search Central documentation, which are generally associated with Google Search or advertising.
The official Google Cloud documentation provides the following details:
"In Vertex AI Agent Builder, there are various kinds of data stores. A data store can contain only one type of data."
The documentation specifies six types of data, including public website data. Regarding crawling, there are two types of website crawling:
- Basic website indexing
- Advanced website indexing
Documentation
The documentation elaborates on website data:
"A data store with website data uses data indexed from public websites. You can provide a set of domains and set up search or recommendations over data crawled from the domains. This data includes text and images tagged with metadata."
The Basic website indexing description does not mention site owner verification. However, a Google representative confirmed that Basic website indexing uses a portion of data already crawled by Google.
Advanced website indexing, which involves the Google-CloudVertexBot, requires domain verification and includes indexing quotas. This suggests that the new crawler doesn’t access public websites indiscriminately but crawls specifically upon a site owner’s request.
The changelog for this new crawler states:
Here’s what the changelog mentions:
"Introducing the Google-CloudVertexBot crawler.
What: Added Google-CloudVertexBot to the list of Google crawlers, a new crawler that crawls sites on the site owners’ request when building Vertex AI Agents.
Why: The new crawler was introduced to help site owners identify the new crawler traffic."
New Google Crawler
The new crawler is named Google-CloudVertexBot.
Information about Google-CloudVertexBot:
"Google-CloudVertexBot crawls sites on the site owners’ request when building Vertex AI Agents.
User agent tokens
- Google-CloudVertexBot
- Googlebot
User agent substring
Google-CloudVertexBot"
Google-CloudVertexBot
The documentation suggests that this new crawler does not index public sites, and the changelog highlights its addition for site owners to identify traffic from this new bot. Should you consider blocking the new crawler through a robots.txt file? It appears unnecessary, as it only crawls at the site owner’s request.
Featured Image by Shutterstock/ShotPrime Studio