Google’s AI Blog recently discussed the workings of Dataset Search and the signals used to rank datasets. Google currently displays dataset rich results and may increase these as publishers implement Schema.org markup. Understanding and ranking datasets is crucial as they could become a new source of traffic.
Google’s dataset developers page was updated in May 2018, indicating that dataset rich data will soon feature in Google’s search results. Publishers are advised to add dataset structured data to their sites in anticipation of this new feature.
Dataset Search
Dataset Search functions using Structured Data Metadata based on the Schema.org/Dataset standard. Google processes this structured data by linking it with its Knowledge Graph and considering other ranking signals like links, thereby creating a dataset search index.
Duplicate Data Sets
Google partly uses the Schema.org sameAS property to identify duplicates, which is intended to canonicalize the original publisher. The sameAs property can be employed to link the data back to a specific URL representing the data’s original publisher.
Additional signals Google uses to detect duplicate datasets include:
- Two dataset descriptions pointing to the same canonical page
- Having the same Digital Object Identifier (DOI)
- Sharing links for downloading the dataset
- Significant overlap in other metadata fields
These signals are combined to definitively determine duplicates since they aren’t perfect in isolation.
Google Knowledge Graph Scholar and Ranking
The ranking of dataset information involves Google’s Knowledge Graph, which helps understand the context, language, and acronyms related to datasets. It provides a data layer for matching entities such as brands, currencies, or languages within datasets.
The Knowledge Graph enhances the search experience by expanding queries through understanding synonyms, correcting misspellings, and using other relationships.
Google Scholar May be a Ranking Signal
Google Scholar can indicate a dataset’s authority and authorship, aiding in better ranking for datasets and helping prevent unauthorized data usage. It provides insights into a dataset’s importance and visibility by showing where it is referenced or cited in publications.
How Google Ranks Datasets
Although Google initially uses regular ranking algorithms due to a lack of user search data, it plans to develop a specialized algorithm for dataset search with enough data insights. Additional signals like metadata quality and citations influence dataset ranking.
Optimize Your Datasets
If your site hosts datasets, this is the perfect time to apply the appropriate Schema.org structured data. Appearing in Google’s rich results could drive traffic to your site instead of your competitors’.
For more detailed insights, check Google’s AI Blog for further information on this topic.