content validity

Content Validity

Thoughtful experiment design in eCommerce testing ensures that your investment produces actionable results. As part of the process, defining the scope of the test is essential to avoid vague hypotheses and failed results. Content validity is a useful tool to assess whether the elements of your experiment are relevant and cover the full extent of ideas in your hypothesis. Using content validity, you can ensure you’re focusing on exactly what you intend to test.

What is Content Validity?

Content validity ensures that your measurement tool is designed correctly to capture comprehensive data relevant to your question or hypothesis. When you evaluate the content validity of a proposed experiment, survey, or test, you determine whether it assesses all the facets of what it is intended to measure. Confirming the content validity of your test is an important preliminary step to ensure that your testing efforts will produce accurate, actionable results. 

Real-World Examples of Content Validity

Content validity is often used for tests that attempt to measure constructs – theoretical concepts or ideas – that are not directly quantifiable. As a content validity example, a driver’s test attempts to measure whether an individual is qualified to earn a driver’s license and operate a car. The ability to drive is the construct being measured, and a test with high content validity would cover all the factors that go into being a good driver, from knowledge of rules of the road to the ability to physically execute routine driving maneuvers. 

In the world of digital marketing, you can apply this content validity definition to A/B and multivariate tests. By evaluating for content validity, you can determine whether your proposed tests capture the full scope of potential solutions and their impacts on the user experience. 

For example, if you want to know whether changing the position of the customer ratings summary on the product page would impact conversion, a test with a high degree of content validity would likely include an array of layout alternatives and designs, and would track not only single-session conversions, but conversions over time. 

4 Types of Measurement Validity

While content validity is a helpful way to assess the quality of your test, it’s not the only method. There are four main types of validity to consider: 

1. Content Validity

Evaluation of the scope of your measurement tool – whether it assesses completely the different elements of your construct. 

2. Face Validity

To establish face validity, you need only determine whether your tool appears “on the face of it” to measure what you intend to test, using nothing more than your judgment and estimation. This type of validity is considered more subjective and less scientific than the other methods.  

3. Construct Validity

Construct validity applies to whether a test or tool aligns with the material it’s meant to measure. Put another way, construct validity evaluates whether the test metrics are appropriate.

4. Criterion Validity

A test’s criterion validity relates to the quality of the test data – its accuracy as a means of measuring the concept or idea you want to capture. Comparing your test’s measurements and results with other valid research is one way to determine the criterion validity; if your data aligns as expected, that’s a sign of validity.

While these validity types assess separate aspects of a test, they can be confusing to distinguish from one another in the abstract. To help hone the differences between them, here’s how they compare: 

Face Validity vs. Content Validity 

Face validity is your sole subjective impression of whether a test is valid overall, while content validity assesses the completeness of the test. Using the example of placement of customer ratings on a product page, you might assess the face validity of your test based on whether it seems to track the conversion rate of the existing design versus the proposed change. To achieve content validity, however, the  approach would be more rigorous, ascertaining that the test measured the impact of multiple potential layouts and conversion rates over different time periods.

Content Validity Vs. Construct Validity

When considering construct vs. content validity, the two terms are somewhat interrelated, but hardly interchangeable. To gauge construct validity, you examine whether the metrics the test uses are actually appropriate for measuring the concept or hypothesis being tested. By contrast, content validity involves how comprehensively and completely the test measures the construct. In the case of the customer reviews, the test would achieve construct validity if you determined that conversion rate was, in fact, the best method for measuring the effectiveness of the new layouts. Content validity would assess whether the test captured the entirety of behaviors related to conversion and an array of layout choices – the full gamut of hypothetical scenarios. 

Criterion Validity Vs. Construct Validity

Construct validity evaluates whether the metrics used within a particular measurement tool align with the concept or hypothesis being tested. Criterion validity assesses whether the test accurately deploys the metrics to capture data and report results. When testing placement of consumer ratings, an evaluation of construct validity would examine whether conversion rate was the right metric to tabulate to determine the effectiveness of layout changes, while the test would achieve criterion validity if high quality measurement produced accurate results that tracked with other valid research.

How Do You Measure Content Validity?

Content validity is an abstract concept, without directly-quantifiable metrics of its own, so assessing it can be subjective. Deciding whether a test measures every aspect of the concept or question at hand is a judgment call. But you can boost the objectivity of your content validity assessment by inviting in other evaluators with expertise in the subject matter. A few leading methods for evaluating content validity include:

  • Focus groups: Prior to designing the test, gather subject-matter experts and ask for their input on measuring the construct you’ve identified. Use their guidance to create a test that addresses all aspects of the topic.
  • Panel of experts: Ask subject-matter experts to review your test or measurement instrument and evaluate its validity as a tool relative to the concept or hypothesis at hand. This widely-accepted method is used to derive the content validity ratio, described below, which enables you to calculate a quantitative content validity score for your test.
  • Pilot testing: Ask a small group of subjects to use the test and study the results to assess whether they comprehensively address the construct.  Adjust the test as needed based on those initial results before releasing it more widely.
  • Literature review: Study other types of experiments that have been validated and conducted and study their content and structure to inform your own test design.

How Is Content Validity Established?

To maximize your chances of creating a valid test, you can integrate multiple methods of establishing content validity into your experiment design process. Steps to take include:

1. Define the construct

  • Establish the bounds of your experiment by clearly defining the topic you’re going to test. What are the key elements of the concept or potential factors that could impact results? 
  • In the customer review example above, the construct might be the impact on conversion of the prominence of customer ratings on the product detail page. This construct precisely defines the scope of enquiry; the test will not explore in-depth the content of customer reviews, or whether displaying a discount for items improves their conversion. We’re focused on positioning the customer rating for maximum impact.

2. Do up-front research

  • Before designing your test, consider a literature review or focus group to gather input on what a valid experiment might look like. The ease and speed of modern eCommerce testing means that you can iterate and run multiple tests to arrive at optimal results, but it’s still a good idea to read up on best practices for testing the construct you have in mind and to check with colleagues to confirm you’re on the right track. 

3. Design your instrument

  • Based on the construct you’ve identified, create a set of potential items to include in the test. For the customer rating example, you could create layouts featuring the customer rating in different positions. In addition, define your measurement methodology and timeframe. Draw up a list of metrics or possible results and set the test timeframe so you maximize chances of having a statistically valid sample. 

4.  Leverage experts to derive a Content Validity Ratio

  • Ask subject-matter experts to rate each item in your test for completeness, relevance and clarity. Using a scoring method initially developed by psychologist CH Lawshe, they should determine whether an item is “essential,” “useful, but not essential,” or “not necessary.” Calculate the ratio as follows:
    • (ne – N/2) / (N/2)
    • where “ne” is the number of experts surveyed who rated an item “essential” and N is the total number of experts.
  • The formula produces results in a range of -1 to +1, with values above 0 indicating that more than half of the experts deemed the test component “essential.” The closer the score is to 1, the more essential that test item is.

5.  Calculate the content validity index (CVI)

  • After scoring the individual elements, average them to derive the overall content validity index of your test – a single numerical rating for the entire test.
  • The minimum value to aim for depends on the number of experts you consult; if you rely on just a few expert evaluations, more of them need to agree a facet of the experiment is “essential” in order for it to be valid. Consult the table of validity thresholds established by Lawshe to check your results.

6.  Launch a pilot test

  • Launch your test to a small group and assess the results to ensure alignment with your experiment’s goals.

7.  Evaluate, adjust, and launch

  • Based on the results of the pilot, make any adjustments needed before launching more widely to your full test audience.

8.  Analyze and iterate

  • Circumstances change over time, so just because a test was valid once doesn’t mean you can use it again in the future without evaluating it again. For example, your initial customer ratings experiment may have generated actionable results that resulted in a conversion lift. But a couple of years from now, the majority of customer ratings may be in video format or take place on social networks rather that eCommerce sites, requiring you to reset the terms of your test before conducting it again.

What Is Content Validity in A/B Testing?

While content validity is most often used in the real of designing tests, quizzes, or evaluation tools, it has a practical application in the realm of A/B testing as well. 

When you evaluate the content validity of your hypothesis and proposed experiment prior to A/B testing, you ensure that your test will produce comprehensive data. Content validity ensures that the page elements and the metrics you track reflect the full extent of your test goals. 

While it takes time to confirm the content validity of tests prior to launch, the payoff can be more actionable and definitive results. With more effective tests, you can achieve growth more efficiently and help your business reach new levels of success.