Types of Statistical Validity: What You’re Measuring and How to Do It

Types of Statistical Validity- What TheyStatistical validity is one of those things that is vitally important in conducting and consuming social science research, but less than riveting to learn about. It doesn’t help that people use the term “validated” very loosely. In a health coaching context, I hear mention of “validated instruments” and “validated outcomes” without a consistent meaning behind the terms.

In fact, there are lots of types of validity, and depending on what you want to do with your data, you may need to establish validity in several different ways. Saying a measure is valid at a high level means that statistically, it’s measuring what it’s supposed to measure in a stable, meaningful way.

Way back when, I was a Graduate Student Instructor for an Advanced Methods course for undergraduates. One of the topics on the syllabus was types of validity, and naturally, it was one the students struggled with. True confession: It wasn’t easy for me either; even being fairly far down my graduate school path, I wasn’t fully comfortable with all of the types of validity.

So, I created a handout cheat sheet summarizing the different types of validity taught in the course, including the type of data one might collect to achieve it. The cheat sheet is below, and I am happy to share a PDF version if you want to reach out to me about it. I hope it helps you in your data endeavors.

Type of Validity Definition How to Achieve It
Statistical Do variables actually covary? For example, do scores on a self-esteem test covary with “actual” self-esteem Comparison of the results of a measure with actual self-esteem; impossible to achieve practically but can be corrected for statistically
Internal What you did in a study caused the results of the study manipulations affected outcomes Ruling out alternative causes; replication with different populations and settings
Construct Does your measure correspond to the theoretical construct it is supposed to? Threats include: poorly developed operational definitions, expectancy effects, using only a single study design or population; 3rd variables
Face Do the questions appear or look like they relate to the construct of interest? Intuition; does it look right to you.
Content Is the question measuring what you want it to measure? Expert opinion; a careful matching of operational definitions to construct definitions
Criterion Does your measure predict real-world outcomes related to the behavior of interest? Data on actual behavior; does it correspond to scores on your measure?
Known-groups Does the measure distinguish between groups known to differ on the critical behavior? Administer measure to groups that show different levels of behavior of interest and compare scores.
Convergent Does the measure or questionnaire show correlations between the behavior of interest and related behaviors? Correlate with related measures (high but not perfect correlations desired); compare outcomes with related outcomes
Divergent Does the measure or questionnaire differentiate the behavior of interest from other behaviors? Correlate with related measures (low correlations desired); compare outcomes with related outcomes
External The degree to which a measure’s conclusions hold for other people at other places and times; generalizability Random sampling; generalizing carefully (not making broad claims); replications of study in multiple settings