Want to get a bunch of social scientists to argue? Ask them to talk about validity in social research what counts as validity, and what doesn’t. Then sit back, grab your popcorn, and pretend it’s television. Because it’s going to get good, not Game of Thrones head-chopping good, but you can expect sparks to fly, and you may even hear some cutting insults about poor methodology.

Validity has been a concern with assessments as far back as 1896, when the Pearson correlation coefficient was created to show the relationship between two variables. It’s grown since then and gotten quite complicated, enough so that in some graduate programs – mine included – there are entire courses concerned just with validity.

What is validity, exactly? The Cambridge Dictionary has as its primary definition “the quality of being based on truth or reason, or of being able to be accepted.” Hmmmm. Based on “truth” or “reason.” That’s a bit vague because “truth” is so subjective. What’s true for one isn’t always true for another. However, I think that works just fine for me and my views on validity in assessment. More on that shortly.

I teach undergraduate and graduate research methods (mainly because no one else wants to teach it; it isn’t the most exciting class to teach). When I get to reliability and validity in the course, I describe validity as a measure of how accurately a test or assessment measures a defined concept. In other words, is the test or assessment accurately measuring what you’re trying to assess? That’s the most basic definition, and it gets much more complicated. Still, for the layperson, it helps provide a basic understanding.

Now, you might be wondering why does this matter to you, dear reader? Well, if you’re considering purchasing an assessment for your organization, understanding validity is critical to making an informed decision. It’s about ensuring that the assessment accurately measures what you’re trying to assess, which can significantly impact the effectiveness of your evaluation process. Let me tell you what our position is, and why, and see if you agree!

There are multiple types of validity, each with a specific focus and method. Here’s a quick list with some very, very basic definitions. Do try to stay awake.

  1. Construct Validity: Deals with whether the test truly measures the construct it purports to measure. This is that most basic form of validity I mentioned that I start with in my introduction to research methods.
  2. Content Validity: Refers to the extent to which an assessment represents all facets of a given construct. In other words, the assessment covers all the relevant content it should to be comprehensive. Think of a simple Venn diagram with two circles – one is all the content about a topic and how much your assessment covers the content; the more they overlap, the greater your content validity.
  3. Criterion-related Validity: This type of validity focuses on relationships between the scores on one assessment and an external criterion. It includes two types of validity: concurrent, where scores are compared with an external criterion at the same time, and predictive, where it can be used to predict a future criterion.
  4. Face Validity: Relates to whether the assessment appears to the participant to measure what it purports to measure, in a superficial sense. It is less about statistical strength than it is acceptance and credibility from the participant.

Those are the “classic” forms of validity that have been taught for years. Often assessment development considers more than one of them. Content validity, followed by data collection and construct validation is a common method, for example. But in recent years, validity has started to take on a more practical definition. Instead of being about math and statistics, it’s about use. There are two types of this validity to consider. I’ll give them a little more detailed explanation.

  1. Ecological Validity: Concerns whether the findings from the assessment can be generalized to real-life settings. It emphasizes the practical application of assessment results in real-world environments. For example, researchers might conduct a study to understand how interactive whiteboards impact student learning in a classroom. For this to have ecological validity, the study would need to be done under real-world conditions (i.e., in real classrooms with real students). If it gets positive results, it’s ecologically valid because it has been demonstrated to work in the real world.

Consequential Validity: Pertains to understanding the consequences or outcomes of the assessment results. It involves assessing the implications, intended or unintended, of using the assessment results and is a more contemporary view, focusing on the ethics and social implications of assessment. For example, researchers might develop a test to select students for medical school based on their academic performance. In doing so, they may very well have a test that meets the other criteria for validity we’ve discussed and leads to better selection for medical school but has an ugly side effect; students invest in taking test preparation classes to get higher scores on the test, while poorer students who can’t afford such help are left out more often than those with better resources. That wasn’t an intended effect, but it happens and has ethical implications. This may sound far-fetched to you, but this occurred in 2018 and is the subject of a study by Kumar, Roberts, Bartle and Eley (2018) in volume 23 of the journal Advances in Health Sciences Education.

If you’ve made it through all that’s been covered here, you’ve made it through a semester’s worth of validity coursework in just a few minutes. Bravo! Now you’re probably wondering what the point is. Well, here it goes.

Considering what companies are trying to achieve through organizational assessments, I really don’t think all forms of validity matter that much. Heresy, you say! Fie and shame on me!

Before you burn me at the stake, though, consider this. I do believe content validity is critical; you need good content to ask good questions. Ecological and consequential validity are also important. Clients who use an assessment in a real setting know it will have real consequences on their employees. These things matter far more than whether two questions load onto the same factor or not or are highly correlated. Assessment users want practicality and applicability. They want an assessment that helps them understand a question they have in a more systemic way than anecdotal data can provide, and they want to feel like they can use what they learn in a safe, practical way. If they’re purchasing an assessment, it’s for a reason, and they want to be sure that reason is being addressed. Remember that definition from the Cambridge Dictionary, and how it uses the word “truth?” Well, this is exactly why that definition works for me. Because the truth of an assessment isn’t in mathematical relationships; it’s in how well it addresses the questions it was intended to answer.

I’ve been working with assessments for a long time. And in all that time, I’ve never had a talent management or C-suite client ask me “what eigenvalues did you find on these items?” or “how many factors did you find in a confirmatory factor analysis?” Nope. They ask things like “how can I use what I’ve learned with this” or “will this tell me where I have areas of weakness in my organization that I should pay attention to?”

To me the real value of an assessment isn’t in all the analytics that go into making it statistically sound. It’s the analytics of the data after it’s collected, and the insights I can get from it that provide the real value. And often, what the client and I think matters is how applicable what we learned is, and that any actions they take based on those learnings will impact the organization in a positive way. It’s really about what matters to the client. Are the insights gleaned applicable and do those insights drive actions that will impact the organization in a positive way?

I know that any measurement professional or statistician reading this is probably cracking their knuckles in preparation for sending me a hotly worded nastygram in response. But you know, I’m not alone in this view. More assessment professionals are beginning to embrace the idea that a truly valid assessment is one where the results provide useful insights that lead to positive and beneficial results for the user. I think if you ask most talent management or human resource professionals what’s important to them, they will agree. For me as the architect of the assessment practice at Talent Dimensions, assessment design is about client focus…not statistical robustness. Our point of view is that a good assessment helps you answer important questions and make better decisions. Any assessment that provides that outcome is valid in my book.

Sources

Kumar, K., Roberts, C., Bartle, E. et al. Testing for medical school selection: What are prospective doctors’ experiences and perceptions of the GAMSAT and what are the consequences of testing?. Adv in Health Sci Educ 23, 533–546 (2