We Are Part of the Problem

Jun 23, 2020

An Argument for a Renewed Commitment to Anti-Racism in Educational Assessment and Accountability

Over the past few weeks, the country has grappled with how to confront systemic racism and create an anti-racist society. We are pleased to share this guest post by former Center Associate Susan Lyons with her perspective on what the educational assessment and accountability community must do to address those issues in the development and use of educational assessments and accountability systems.

In the wake of the murder of George Floyd and too many other Black Americans who have been senselessly killed by law enforcement, we are experiencing a collective call to examine systems of oppression and racism across our institutions. Racism and its human costs are entrenched in every corner of American life, and efforts to identify and dismantle systemic racism must go beyond reforming policing. As a professional in the field of educational assessment and school accountability, I am challenging myself and my colleagues to examine how our current systems contribute to the problem and how they may be reformed to explicitly elevate those who have been oppressed in our country for centuries.

I am certain that nothing I am saying in this brief is new. This is instead serving as my own reckoning with the systems I have studied and helped to build in my professional career thus far. I am also certain that I have blind spots and may have misunderstandings or mischaracterizations. Thank you in advance for joining me in the conversation to understand and confront the origins and legacy of racism and oppression within the field of psychological and educational assessment.

The large and persistent educational test score gaps between Whites and Blacks in this country have been well documented (see the Coleman Report, 1966). Efforts to eradicate these gaps have been well-intentioned and widespread, yet largely ineffectual. Since the 1960s, federal education policy has sought to provide additional resources to low-income communities and hold schools accountable for educating all students. Policymakers doubled down on these efforts in 2001 with the passage of the No Child Left Behind Act which was intended to close achievement gaps through test-based reform, a system that remains largely unchanged in our public schools today.

This post is organized into two sections that highlight race-related issues in our current assessments and systems of accountability, and a final section that discusses why now is the time to begin charting a new path forward.

Problematic Origins of Standardized Assessment

Any introductory psychometrics course will at least provide a cursory overview of the history of standardized assessment in America—starting with intelligence testing in the early twentieth century and the 1917 Army Alpha assessment used to sort and assign roles to thousands of army recruits for World War I. These early assessments—and many that followed—do not fare well against our modern standards for detecting and removing racial bias from assessment instruments (Galloway, 1994). In fact, these early instruments have been widely criticized for their use as “scientific proof” of white supremacy (Ladson-Billings, 1998). While our methods for estimating scores have advanced, and statistical tools for detecting and correcting for potential sources of bias have emerged and evolved, the old psychological theories that underlie assessment development have not changed (Shepard, 2000).

Our present-day assessment instruments used by states to measure student achievement are almost invariably developed to measure student content knowledge on a unidimensional scale—a lasting byproduct of the early efforts to order people on an intelligence scale. Test items selected to fit a unidimensional scale are intended to maximize reliability under the content constraints. This means that items are purposefully developed and selected to measure the strongest underlying factor that runs through all of the measured content standards in a given subject and grade. Items that produce information that is too disparate from the “underlying factor” are thrown away or minimized in the measurement model. The most valued items for estimating the scale are those that are best at discriminating among examinees (yes, discrimination is a technical term and is a valuable attribute in our current assessment models). Highly discriminatory items are those that give us the most information for distinguishing among examinees. The unidimensional scale is used because it is an excellent tool for doing exactly what it was developed to do, reliably rank-order individuals along a continuum. I will argue, however, that the notion of a unidimensional scale is incongruous with what we know about how people learn, and our societal values.

We know that the process of knowledge acquisition is dependent on context and culture and is inherently multidimensional. Our learning theories have moved beyond trait and behavioral psychology (see How People Learn II, 2018), but our process for measuring learning through standardized assessment remains stuck in the behaviorist paradigm. See Lorrie Shepard’s 1991 criticism of the failure of the measurement community to recognize our reliance on outdated theories of learning and explicitly account for advances in the science of learning in our assessment designs; a criticism that is as relevant and urgent today as it was three decades ago.

Furthermore, the idea of creating an instrument optimized to reliably distinguish among individuals along a single trait is incompatible with the primary intention of state testing programs. These testing programs are intended to measure what students know and can do relative to the state standards and, increasingly, to track learning progress. State testing programs should be designed to explicitly value multiple pathways to learning (e.g., affirming and building on students’ own lived experience), accept multiple forms of evidence for demonstrating what students know and can do, and allow for more meaningful and useful measures to track the learning that has occurred. The current assessments have not been widely adopted out of negligence, but because they are the only viable option for states given current federal requirements, state education budgets, and the very reasonable public demand that state assessments have as small a footprint as possible.

In the work ahead to draft the next reauthorization of the Elementary and Secondary Education Act, policymakers and measurement experts must work together to envision a new framework for assessment and accountability that moves beyond the familiar format to reimagine the purpose, use, and form of mandated assessments in our schools. The next section of this post explores how the current state of affairs for the test-based reform movement has led to disproportionate impacts, contributing to the very problems they are intended to address.

The Role of Accountability Systems in Perpetuating School Inequality

Educators and school improvement scholars know more about the adverse impacts of standardized assessment on teaching and learning than I do, but we can all recall some of the more well-publicized examples. These include an overemphasis on tested skills at the sacrifice of non-tested subjects (e.g., social studies, arts, physical education), and the adoption of “teaching to the test” instruction that focuses on drilling remedial and lower-level skills, taking instructional time and energy away from deeper learning.

Testing and accountability systems have been designed to intentionally focus resources and interventions on the lowest-performing schools. Schools that tend to do well on the tests, often directly benefiting from the relative privilege of their students, have experienced fewer of the negative instructional impacts. Scholars have argued that the low-level, test-based teaching practices that have been widely documented in predominantly African American schools are a systemic oppression of Black students (Davis & Martin, 2018). This is not to place blame on the educators, rather to point out that the prevalence of “teaching to the test” is a predictable consequence of Campbell’s law.

As my colleague, Charlie DePascale has repeatedly argued (see this blog post as an example), the purpose of Title I accountability has always been program evaluation—to ensure that federal funds are being spent on effective programs. The assessment should, therefore, be considered one measure among multiple indicators that attempt to describe school program effectiveness. However, when the goal of instruction becomes improved scores on the Title I assessment, we know the system has been distorted in undesirable ways.

In addition to the negative impacts on teaching and curriculum, there are larger, system-wide negative effects of test-based reform that disadvantage communities of color. As an example, the conversation about “good schools” and “bad schools” has too often been reduced to the simplistic rank ordering and labeling of schools based primarily on standardized test scores. Those families with the resources to select a home on the basis of the perceived quality of the school district look to the test scores to inform their decisions, a phenomenon that has been linked to increased school segregation by race, further entrenching inequality (Knoester & Au, 2014).

The Standards for Educational and Psychological Assessment takes an authoritative stance on the long-standing debate in the measurement community about whether or not the consequences of an assessment program—even those that are unintended—can adversely impact the validity of the assessment itself. The standards prioritize fairness in both construction and use of assessment instruments and thus regard the consequences of an assessment program as an important source of validity evidence (AERA, APA & NCME, 2014).

Given the known, negative impacts of the use of standardized assessments on already marginalized communities, it is our responsibility as measurement professionals to step away from our reliance on the familiar toolbox for developing state assessments and explore new possibilities (e.g., valuing locally-generated information, exploring designs that support multidimensional reporting, developing content-based learning maps) that better reflect what we know about learning and better serve those who are most directly impacted by test use—students, educators, and communities.

Why Now is the Time to Change

In June 2020, the world feels more uncertain than ever before in my lifetime. Schools are facing unprecedented challenges, and students of color are bearing a disproportionate share of the traumas and learning loss brought by COVID-19. We are hearing the collective call for racial justice louder than before. We have an opportunity to use the unexpected pause on state testing and accountability to reflect on how the systems in place have contributed to the problem of racial inequality, and how we can work to correct our mistakes—as well-intentioned as they may have been.

I am inspired by many of my colleagues who have long been tackling the problems outlined in this post. It has been my honor to be a technical adviser to the Massachusetts Consortium for Innovative Educational Assessment (MCIEA.org) for the past few years. This consortium represents a diverse set of districts working together to imagine and design a new kind of accountability system that values a broad set of measures of school quality. Measures that have been developed in direct response to what their own communities most value in schools, such as data on student safety, surveys of social-emotional wellbeing, and the adequacy of school resources and facilities. The districts involved in the MCIEA school quality reporting system are also actively working to include student achievement and growth as measured by instructionally-embedded performance assessments. This passionate group of districts does not have all the answers, but they have worked diligently to create a space for difficult conversations about the failures of the current system, felt most acutely in communities of color, and how we can envision and build a new path forward.

Reforming educational assessment and accountability alone will not address the systemic inequities in our society that lead to the racial disparities in educational outcomes. The cards are stacked against students of color who are more likely to be born into poverty, attend schools with fewer resources, and have teachers with fewer years of education and experience (Betts, Zou & Rice, 2003).

One of the primary goals of our current systems of assessment and accountability is to shine a spotlight on disparities in our educational system by providing a measure of achievement that can be disaggregated and examined by race. Our current high-stakes assessment systems have exposed large and persistent gaps, but they have failed to serve as the lever of reform and have even contributed to perpetuating systemic inequities. I am looking forward to finding new ways to harness the renewed energy around anti-racism to continue to question and redesign our systems of assessment and school accountability across the country.


Campbell’s Law – “The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor” (Campbell, 1979).

Susan Lyons, Ph.D. is an independent consultant. Thank you to Scott Marion, Charlie DePascale, and John Poggio whose thoughtful feedback greatly improved this piece. Any errors remain my own. Please submit comments to slyons@nciea.org.