There is a type of link algorithm that isn’t discussed widely enough. This article serves as an introduction to link and link distance ranking algorithms. It’s an important aspect that might influence how websites are ranked, and in my opinion, it’s essential to be aware of it.
Does Google Use This?
While the algorithm in question is from a patent filed by Google, Google has stated that while they produce many patents and research papers, not all are used, and some are implemented differently than described. However, the details of this algorithm seem to align with Google’s official statements about how it manages links.
Complexity of Calculations
The patent, "Producing a Ranking for Pages Using Distances in a Web-link Graph," highlights the complexity of these calculations:
"Unfortunately, this variation of PageRank requires solving the entire system for each seed individually. As the number of seed pages increases, the computation complexity increases linearly, limiting the number of seeds that can be used practically."
It points out the challenge of making these calculations web-wide due to the enormous number of data points, suggesting that breaking them down by topic niches makes computations easier.
Interestingly, the original Penguin algorithm was calculated once a year or longer, and sites penalized remained so until Google recalculated the Penguin score at the next, seemingly random, date.
At a certain point, Google’s infrastructure apparently improved. Google is continually developing its infrastructure, though they don’t always announce it. The Caffeine web indexing system is an exception. Real-time Penguin was launched in fall 2016.
These calculations are notably difficult, suggesting that Google might perform a periodic calculation for the entire web, assigning scores based on distances from trusted sites to all other sites. This process sounds much like the Penguin Algorithm.
"The system assigns lengths to the links based on link properties and those of the linked pages. It computes shortest distances from the seed pages to each page based on the links’ lengths. Next, it determines a ranking score for each page using the calculated shortest distances.”
What is the System Doing?
The system generates a score based on the shortest distance between a seed set and proposed ranked pages, using it to rank these pages. This acts as an overlay on the PageRank score to filter out manipulated links, based on the theory that such links naturally have a longer connection distance between the spam page and the trusted set.
Ranking a web page involves three processes: Indexing, Ranking, and Ranking Modification (often related to personalization). This is a simplified view, as more is involved.
This distance ranking happens during the ranking process. With this algorithm, there’s no chance of ranking for meaningful phrases unless the page is linked to the seed set.
"One possible variation of PageRank is selecting a few “trusted” pages (seed pages) and discovering other likely good pages by following links from these trusted pages.”
Understanding at which stage the seed set calculation occurs helps in formulating a ranking strategy. This is unlike the Yahoo TrustRank, which was biased. Majestic’s Topical TrustFlow is a refined version, resembling research showing that organizing a seed set by niche topics is more accurate. It stands to reason that Google’s distance-ranking algorithm also organizes its seed set by niche topics.
This Google patent calculates distances between a seed set and assigns scores.
Reduced Link Graph
"In a variation on this embodiment, the links associated with the computed shortest distances form a reduced link-graph.”
This implies there’s a map of the Internet, known as the Link Graph, and a smaller version, a link graph filtered from spam pages. Sites primarily obtaining links outside this reduced graph may never get in. Thus, dirty links gain no traction.
What is a Reduced Link Graph?
The concept involves a map of the internet minus certain sites that don’t meet specific criteria. High-quality search results require detecting and minimizing the influence of links that exist for purposes other than conferring authority.
If obtaining links from reputable sites like news organizations, it could be assumed they are within the reduced link graph, although obsession over being part of the seed set may not be necessary.
Is This Why Google Says Negative SEO Doesn’t Exist?
"…links associated with the computed shortest distances form a reduced link-graph."
A reduced link graph differs from a link graph, the latter being a map of the Internet organized by link relationships. In contrast, a reduced link graph maps everything minus certain unwanted sites.
This might explain why negative SEO supposedly doesn’t exist; a spam site linking to a normal site won’t negatively impact it because the spam site, being outside the reduced link graph, has no effect. It’s ignored.
Distance from Seed Set Equals Less Ranking Power?
Understanding the seed set’s particulars isn’t crucial. However, awareness of topical neighborhoods and their relation to link acquisition is. Google once publicly displayed a PageRank score, revealing trends. Some sites with low PageRank and Moz DA might be a few clicks away from the seed set.
While Moz DA is a useful measure of site authority, it doesn’t reflect distance from a seed set, which remains a Google secret. Therefore, using Moz DA is helpful, but expanding the criteria for a useful link may be beneficial.
What Does it Mean to be Close to a Seed Set?
A Stanford document suggests notions of proximity: Multiple connections, Quality of connection, Direct & Indirect connections, Length, Degree, Weight.
Takeaway
Concerns over anchor text ratios, DA/PA of inbound links seem outdated. These are relics of a practice centered around obtaining links from a randomly chosen PageRank score, the number four. It’s useful to view distance ranking as part of link discussions.
For more information, you can search for the patent with U.S. Patent Number 9165040.