Null Hypothesis

The premise of A/B testing is to compare the performance differences between versions of a website experience. To make that comparison and quantify the difference, you need to start by defining its opposite – a null hypothesis.

What is a Null Hypothesis?

The null hypothesis is a mathematical concept used to analyze and measure the statistical significance in a set of given observations. Two people may show up at the same cafe at the same time, but that doesn’t mean they know each other. Similarly, just because a test version of your eCommerce checkout resulted in more orders than your existing flow, that doesn’t mean the new design is more effective. The improvement could be due to random differences in behavior between your split test audiences.

The null hypothesis is the likelihood that the results of your test are due to chance rather than a causal relationship. Put in technical terms, a null hypothesis claims that an independent variable doesn’t have a significant effect on a dependent variable. Using the case above, this means the new checkout design (independent variable) has no significant effect on orders (dependent variable).

Scientifically speaking, you can’t prove a negative, so while you can’t ever prove a null hypothesis is true, you can disprove it – that is, you can prove that a statistically significant difference does exist between test and control versions. In short, when you test proposed changes for your website, you’re testing whether you can reject the null hypothesis.

Null Hypothesis Example

For example, as you prepare to launch a new banner ad campaign, you might decide to test different designs. As you craft your test, you would formulate the following null hypothesis: “There is no statistically significant difference in click-through rates between different banner designs.”

If the test results show that one version produces a significantly higher click-through rate, then the null hypothesis would be disproved, and a winning banner ad design selected. On the other hand, if no significant difference is detected, then the null hypothesis can’t be rejected, and you might return to the drawing board to devise an alternative that produces a stronger result.

Null Hypothesis Vs. Alternative Hypothesis

In the cafe example above, the two people who show up at the same time don’t necessarily know each other – but their arrival at the same time also doesn’t prove that they don’t know each other. This is where the alternative hypothesis comes in.

Whereas the null hypothesis makes claims about insignificance, the alternative hypothesis claims that significance exists between variables. In this regard, the alternative hypothesis stands in opposition to the null hypothesis, stating that the opposite is true. But be careful: we’re not talking about a true/false dynamic. When a null hypothesis is rejected, that doesn’t make the alternative hypothesis true by default.

Using the earlier banner ad example, consider these statements:

There’s no difference in click-through rates between the two banners (null hypothesis).
The banner with motion graphics has a higher click-through rate than the banner with a static image (alternative hypothesis).

Even if the null hypothesis is rejected, you should further test the alternative hypothesis with more variations before confidently accepting it.

Failing to Reject the Null Hypothesis

A null hypothesis can be rejected. But sometimes, test results aren’t conclusive. In those instances, the results fail to reject the null hypothesis.

Usually, failure to reject the null hypothesis means you go back to the drawing board. A fresh test with clearer differences between the control and test versions may result in more decisive results.

For example, when testing email subject lines for the final stages of a travel sale event, the null hypothesis is that there’s no difference in open rates between the email subject lines “Sale Ends Soon” and “Get Packing – Buy Now!” At the end of the test, if the difference in open rates between the two subject lines is negligible, you can’t reject the null hypothesis.

Thoughtful test design helps ensure that your test results are accurate and you can confidently either reject the null hypothesis or conclude the null hypothesis can’t be rejected. If your test contains built-in flaws, it may produce a Type 1 or Type 2 error.

A Type 1 error incorrectly rejects the null hypothesis – that is, the test produces a false positive. You could end up adopting a website change that doesn’t improve performance or, worse, causes performance to drop.
A Type 2 error incorrectly fails to reject the null hypothesis – that is, the test produces a false negative. You could end up keeping the status quo even when the test version would, in fact, be more effective.

How to Calculate the Null Hypothesis

When you design an A/B test, it’s essential to build in the components that will lead to valid results. That starts with a clear statement of the null hypothesis.

The Null Hypothesis Symbol

If you want to state your test mathematically, the standard notation for the Null Hypothesis is H₀. Similarly, the symbol for an alternative hypothesis is H_a(occasionally written as H₁).

Here’s how the Null Hypothesis and Alternative Hypothesis are communicated in a research context:

H₀: There is no difference in order size between customers who receive a discount offer and those who didn’t.
H_a: There is a greater order size with customers who received a discount offer than customers who didn’t.

Calculating Whether to Reject the Null Hypothesis

Take these steps to maximize your chances for a decisive verdict on rejecting the null hypothesis:

Define your null hypothesis and alternative – Write out your null hypothesis and the alternative, citing specific metrics you’ll use to measure the results.
Set your significance threshold – Assign a target confidence level in your test results – a percentage representing the certainty that the test results are due to the variables being tested and not to random chance. Most A/B tests aim for a confidence level of at least 95%. The significance is 1 – C where C is the confidence level; the standard is 5%. The lower the significance, the more likely the test results are not due to chance.
Set the test power and sample size. – The power of the test is a number representing the probability that a test will accurately reject the null hypothesis. Sample size affects the power, so ensure your audience is sufficient to gather data on both the independent and dependent variables. If need be, you may need to set a longer test duration in order to gather enough participants. Monetate and other testing tools have built-in calculators to help determine the right sample size; you can also check out PowerAndSampleSize.com and other online calculators.
Perform the Test Completely – Identify the most appropriate statistical test, eg a T-test or Chi square test, to measure your results. It can be tempting to end the test early if results appear dramatic, but it’s crucial to keep it running for the predetermined duration to avoid errors.
Calculate the Test P-Value – To determine whether or not you can reject the null hypothesis with certainty, you need to determine the probability that your individual results are based on chance. This probability is expressed as a number, the P-value. If your testing tool doesn’t compute the P-value for you, consult P-value calculators available online for the type of test you’ve run.

Then, compare that percentage to the significance level you set before running the test.

A P-value lower than the significance level indicates a statistically significant finding and your null hypothesis should be rejected. In other words, using the example of order size and customers who received a discount offer, it’s unlikely that the difference in order size is due purely to chance.
A P-value greater than the significance level indicates a statistically insignificant finding and your null hypothesis can’t be rejected. In other words, it’s possible that there isn’t a relationship between order size and the discount offer.

Remember that a rejected null hypothesis doesn’t mean your alternative hypothesis is necessarily true, but you can proceed to test the idea further.

Continuous testing and analysis will guide how you fine-tune your customer interface and personalization so that you can maximize the effectiveness of future offers to boost your eCommerce success.

How is Null Hypothesis Used in A/B Testing?

In A/B testing, the null hypothesis is a critical, statistically rigorous tool for marketers to validate assumptions about customer behavior and guide decision-making.

By testing your theories about website optimization against the standard of the null hypothesis, you can weed out variables that aren’t having a significant impact on customer behavior so you can focus on high-impact outcomes.