Fairness in Educational Testing

Oct 01, 2020

The Role of Values in Addressing Fairness in Educational Testing Purpose, Use, and Consequences

What Is Fairness in Educational Testing? 

We take for granted that we know what is fair and what is not, but fairness means different things to different people, in different contexts, at different times. Fairness is amorphous – often defined using related, but non-equivalent, terms like equity, justice, unbiased, and the like. We accept these definitions, and yet each of those terms is just as difficult to define. Such ambiguity inevitably leads to circumstances in which what we label as fair is based on our own value systems. To the extent that we do not share the same values, determining what is fair can be more than just difficult. It can be an outright dilemma. Because there really is no universally-fair outcome, we are left with fairness arguments that can be sources of deep divisions. We might even imagine some extreme examples where individuals or communities hold values that represent a threat to others, or that restrict access to certain benefits in unreasonable ways. 

In practice, fairness is often defined synonymously with equity. In educational testing, equity in the opportunity to learn and to demonstrate that learning is fundamentally valued and most often assumed during test construction, even where evidence suggests otherwise. On a more technical level, equity often refers to conditions where a test’s measurement properties are similar across different examinee groups—that the scores produced are not biased in favor of or against any student groups. 

We  do not, however, require equal test scores. To do so would be to say that we do not need the test, for if the purpose of educational testing is to measure how much a student has learned, then the same score for all students would make the results meaningless and unnecessary. We expect students to perform differently. Of course, what concerns us is when differences in student outcomes fall across student groups that share traits such as gender, race, ethnicity, disability, English learner, and socio-economic status. Such disparities can have profound negative consequences. 

Supporting Fair Practices in Test Design and Use

Given the role that values play in determining what is fair, I have recently wondered if fairness is even a useful goal for assessment? As heretical as that may sound, if we cannot agree on a single definition of fairness, or what is fair, how can we ever agree on whether a test is fair or unfair? 

Of course, I do believe that we can support fairer practices in test design and use, even in the absence of a universally-accepted definition of fairness. I contend (as have others before me) that, as challenging as competing values can be for determining what is fair, values are the key to framing more useful and effective approaches to establishing, maintaining, and defending the fairness of a test. 

I offer an example to help explain this statement. 

In the U.S., we see value in preparing students for success in college and career. One way that we determine their readiness is by testing them. When we use these test scores for college admissions, we are making an implicit merit-based case for the value of the test’s purpose, which is to objectively determine a student’s readiness for college success. We accept that a measure of a students’ academic knowledge and skills is useful for this purpose, so long as readiness scores are shown to be valid and reliable for this intended purpose, and for all examinees. If this assumption is true, universities can then use readiness scores in admissions and course placement decisions in ways that isolate the academic merit of applicants from other more subjective, or even biased, criteria. 

But what if we reframe our value statement to say: students need a post-secondary education in order to be prepared for success in life? In this case, what is considered fair shifts from a merit-based to an equity-based determination, and our view of the possible consequences of test score use changes. In this second value paradigm, our equity-based values lead us to be concerned about the consequence of restricting access to the opportunity for earning a college degree. In such a scenario, we might characterize the consequences of testing as unfair, not because the test or its defined purpose (i.e., to determine a student’s readiness for college success) is inherently unfair or invalid, but because the use of those scores for college admissions plays a role in restricting access to college for those achieving scores below established readiness benchmarks. If a college degree increases one’s access to a successful life, we might argue that restricting that access, no matter how well intentioned,  presents an unfair barrier to equal opportunity for some individuals and groups, and we become professionally, morally, ethically, and legally concerned about the effects of disparate impact. 

So, how can we reasonably incorporate and balance multiple value systems in the fairness argument for a test? The answer is simple, although its execution is not. We start by acknowledging and respecting the different value systems that influence test purpose, use, and consequences, and by facilitating a meaningful and constructive dialogue around them. 

Balancing Values in Fairness Dialogues

Building on the example of college readiness, parties to such a dialogue might seek a way to retain the value of merit-based admissions and placement decisions through use of well-constructed, well-implemented tests, while working toward the goal of making college entrance possible for anyone who may wish to seek it. To achieve such a goal, we might ask, “how can we reconceptualize readiness testing in a way that recognizes that, given the right opportunities, all students who work toward the goal of college readiness can achieve it?

If we value both equity and merit in determining access to a college education, we might consider alternative readiness test designs—designs that move away from an emphasis on a single cut score to define whether a student is ready for college and toward designs that incorporate the identification of the skills the student should develop to improve their readiness for success. Because this reframing focuses on paths to readiness, it would require multiple opportunities for students to receive feedback on their progress in a balanced and comprehensive assessment system.

A new fairness dialogue might also involve re-evaluating what constitutes academic merit itself by explicitly incorporating diverse perspectives in assessment design. By “diverse perspectives,” I include not only diversity of cultural, racial, ethnic, and gender groups, but also of different stakeholder groups, including representatives from universities, businesses, government and non-governmental organizations. 

It is time to address competing values head on in our dialogue on fairness in testing purpose, use, and consequences. If we cannot, we relinquish that responsibility to the courts, and risk decisions that set aside the many benefits to society that are offered by high quality tests. Facilitation of such dialogues is the key to moving the cause of fairness in testing forward. We have a responsibility to build fair tests, for fair uses, with fair consequences. This obligation holds even when some consequences are unintended, and even when it is hard to do.