Statistical validity is one of those things that is vitally important in conducting and consuming social science research, but less than riveting to learn about. It doesn’t help that people use the term “validated” very loosely. In a health coaching context, I hear mention of “validated instruments” and “validated outcomes” without a consistent meaning behind the terms.
In fact, there are lots of types of validity, and depending on what you want to do with your data, you may need to establish validity in several different ways. Saying a measure is valid at a high level means that statistically, it’s measuring what it’s supposed to measure in a stable, meaningful way.
Way back when, I was a Graduate Student Instructor for an Advanced Methods course for undergraduates. One of the topics on the syllabus was types of validity, and naturally, it was one the students struggled with. True confession: It wasn’t easy for me either; even being fairly far down my graduate school path, I wasn’t fully comfortable with all of the types of validity.
So, I created a handout cheat sheet summarizing the different types of validity taught in the course, including the type of data one might collect to achieve it. The cheat sheet is below, and I am happy to share a PDF version if you want to reach out to me about it. I hope it helps you in your data endeavors.
Type of Validity | Definition | How to Achieve It |
---|---|---|
Statistical | Do variables actually covary? For example, do scores on a self-esteem test covary with “actual” self-esteem | Comparison of the results of a measure with actual self-esteem; impossible to achieve practically but can be corrected for statistically |
Internal | What you did in a study caused the results of the study manipulations affected outcomes | Ruling out alternative causes; replication with different populations and settings |
Construct | Does your measure correspond to the theoretical construct it is supposed to? | Threats include: poorly developed operational definitions, expectancy effects, using only a single study design or population; 3rd variables |
Face | Do the questions appear or look like they relate to the construct of interest? | Intuition; does it look right to you. |
Content | Is the question measuring what you want it to measure? | Expert opinion; a careful matching of operational definitions to construct definitions |
Criterion | Does your measure predict real-world outcomes related to the behavior of interest? | Data on actual behavior; does it correspond to scores on your measure? |
Known-groups | Does the measure distinguish between groups known to differ on the critical behavior? | Administer measure to groups that show different levels of behavior of interest and compare scores. |
Convergent | Does the measure or questionnaire show correlations between the behavior of interest and related behaviors? | Correlate with related measures (high but not perfect correlations desired); compare outcomes with related outcomes |
Divergent | Does the measure or questionnaire differentiate the behavior of interest from other behaviors? | Correlate with related measures (low correlations desired); compare outcomes with related outcomes |
External | The degree to which a measure’s conclusions hold for other people at other places and times; generalizability | Random sampling; generalizing carefully (not making broad claims); replications of study in multiple settings |