When working on sites with traffic, there is much to lose or gain from implementing SEO recommendations. The downside risk of an SEO implementation gone wrong can be mitigated by using machine learning models to pre-test search engine rank factors. Pre-testing aside, split testing is the most reliable way to validate SEO theories before deciding to roll out the implementation sitewide or not. We will go through the steps on how you can use Python to test your SEO theories.
Choose Rank Positions
One of the challenges of testing SEO theories is the large sample sizes required to make the test conclusions statistically valid. Split tests, made popular by Will Critchlow of SearchPilot, favor traffic-based metrics such as clicks, which is suitable if your company is enterprise-level or has ample traffic. If your site doesn’t have that envious luxury, then traffic as an outcome metric is likely to be a relatively rare event, meaning your experiments will take too long to run and test. Instead, consider rank positions. Quite often, for small- to mid-sized companies looking to grow, their pages will often rank for target keywords that don’t yet rank high enough to get traffic. Over the test duration, for each data point of time, for example day, week, or month, there are likely to be multiple rank position data points for multiple keywords. In comparison to using a metric of traffic (which likely has less data per page per date), rank position reduces the time required to reach a minimum sample size. Thus, rank position is excellent for non-enterprise-sized clients conducting SEO split tests, allowing them to attain insights much faster.
Google Search Console Is Your Friend
Deciding to use rank positions in Google makes using the data source a straightforward (and conveniently low-cost) decision in Google Search Console (GSC), assuming it’s set up. GSC is a good fit here because it has an API that allows you to extract thousands of data points over time and filter for URL strings. While the data may not be the gospel truth, it will at least be consistent, which is good enough.
Filling In Missing Data
GSC only reports data for URLs that have pages, so you’ll need to create rows for dates and fill in the missing data. The Python functions used would be a combination of merge() (like the VLOOKUP function in Excel) used to add missing data rows per URL and filling the data you want for those missing dates on those URLs. For traffic metrics, that will be zero, whereas for rank positions, that will be either the median (if you assume the URL was ranking when no impressions were generated) or 100 (to assume it wasn’t ranking). The code is given here.
Check The Distribution And Select Model
The distribution of any data represents its nature, in terms of where the most popular value (mode) for a given metric, like rank position, is for a given sample population. The distribution also tells us how close the rest of the data points are to the mean or median, i.e., how spread out (or distributed) the rank positions are in the dataset. This is critical as it will affect the choice of model when evaluating your SEO theory test. Using Python, this can be done both visually and analytically; visually by executing this code.
The chart above shows that the distribution is positively skewed, meaning most of the keywords rank in the higher-ranked positions (shown towards the left of the red median line). Now, we know which test statistic to use to discern whether the SEO theory is worth pursuing. In this case, there is a selection of models appropriate for this type of distribution.
Minimum Sample Size
The selected model can also be used to determine the minimum sample size required. The required minimum sample size ensures that any observed differences between groups (if any) are real and not random luck. That is, the difference as a result of your SEO experiment or hypothesis is statistically significant, and the probability of the test correctly reporting the difference is high (known as power). This is achieved by simulating a number of random distributions fitting the above pattern for both test and control and taking tests.
When running the code, you will see output indicating how sample size and significance level relate. The numbers represent the proportion of simulation runs or experiments in which significance will be reached, statistical power, and required sample size. Experience has taught me that you can reach significance prematurely, so you’ll want to aim for a sample size likely to hold at least 90% of the time—220,000 data points are what we’ll need. This is important because having trained a few enterprise SEO teams, all of them complained of conducting conclusive tests that didn’t produce desired results when rolling out the winning test changes. This process avoids wasted time, resources, and injured credibility from not knowing the minimum sample size and stopping tests too early.
Assign And Implement
With that in mind, we can now start assigning URLs between test and control to test our SEO theory. In Python, we’d use the np.where()
function, like an advanced IF function in Excel, where we have several options to partition our subjects based on string URL pattern, content type, keywords in title, or depending on the SEO theory you’re looking to validate. Use the Python code given here. Strictly speaking, you would run this to collect data going forward as part of a new experiment. But you could test your theory retrospectively, assuming there were no other changes that could interact with the hypothesis and change the validity of the test—something to keep in mind, as that’s a bit of an assumption!
Test
Once the data has been collected, or you’re confident you have the historical data, you’re ready to run the test. For our rank position case, we will likely use a model like the Mann-Whitney test due to its distributive properties. However, if you’re using another metric, such as clicks, which is poisson-distributed, for example, you’ll need another statistical model entirely. The code to run the test is provided, and once executed, you can print the output of the test results.
The results revealed the impact of commercial landing pages with supporting blog guides that internally link to the former versus unsupported landing pages. In this case, we showed that offer pages supported by content marketing enjoy a higher Google rank by 17 positions on average. The difference is significant, too, at 98%! However, more time is needed to gather more data—specifically, another 210,000 data points. As with the current sample size, we can only be sure that.
Split Testing Can Demonstrate Skills, Knowledge, And Experience
In this discussion, we’ve covered the process of testing your SEO hypotheses, including the thinking and data requirements for a valid SEO test. By now, you may appreciate that there is much to unpack when designing, running, and evaluating SEO tests. My Data Science for SEO video course explores deeper (with more code) the science of SEO tests, including split A/A and split A/B. As SEO professionals, we sometimes take certain knowledge for granted, like content marketing’s impact on SEO performance. Clients often challenge this knowledge, so split-test methods can be valuable in demonstrating your SEO skills, knowledge, and experience.
Featured Image: UnderhilStudio/Shutterstock