Yes, But There Are Tradeoffs
State testing has long been criticized for many reasons. Its use as the main measure in high-stakes accountability systems has led to well-documented negative effects on school cultures. It takes precious time away from instruction, and worse, the results are not useful for improving instruction. These and other factors have only intensified calls for alternative approaches to annual testing.
One bill in Congress right now, the More Teaching Less Testing Act of 2023 (H.R. 1741), introduced by Rep. Jamaal Bowman, removes the requirement for annual testing. On the other hand, a number of advocacy organizations argue that yearly, high-stakes testing is necessary to advance our educational system.
We undertook a study to examine the tradeoffs necessary to reduce the footprint of state testing. We share some early findings here and will provide more detail soon in a policy brief and a technical paper.
You Can’t Get Something For Nothing
All big decisions require tradeoffs, and the right path isn’t always easy to discern. This is certainly the case when we imagine reducing state testing. Saying we should reduce it is easy; identifying the potential costs of doing so is not. Consider the role that assessment results play in identifying schools—and groups of students—that need support.
One of the most important benefits of federal education law in the last three decades—since the Improving America’s Schools Act of 1994—has been its focus on the performance of identified student groups, such as students with disabilities and economically disadvantaged students. Federal law requires states to rely heavily on test scores to determine which schools and student groups need support. But reducing testing makes it harder for states to identify those schools and student groups.
Federal law requires each state to set its own “minimum n,” the minimum number of students considered a “group” identifiable for support in a state’s accountability system. But if fewer students are tested, more student groups will fall short of the minimum-n thresholds, increasing the chance that they—and their schools—won’t be identified for support. Some states may be in a position to offset this problem by lowering their minimum n to recapture more students.
Reducing testing can affect schools in another way, too: depending on how states choose to reduce testing, their ability to measure student longitudinal growth—a critical evaluation and monitoring tool—might be compromised.
We designed our study to shed light on these and other tradeoffs that arise when states shrink their assessment footprints. We wanted to see if states could reduce the time students and schools spend on testing (including all the time spent preparing for testing) without sacrificing valuable information about which student groups—and schools—need support.
What We Did: Scenarios for Reducing State Testing
To examine these tradeoffs, we designed a prototypical accountability system that meets the requirements of the Every Student Succeeds Act (ESSA) for making accountability determinations. We computed those determinations using real historical data from two states for five plausible scenarios to reduce the testing burden:
- Testing once per grade span (in grades 5 and 8)
- Testing twice per grade span (in grades 3, 5, 6, and 8)
- Testing every other year (in 2018 but not 2019)
- Testing twice per grade span every other year
- Testing with a half-length test
We compared the accountability determinations produced with these scenarios against the baseline of current practice: annual census testing in grades 3-8 with a full-length test. We also studied the results with different minimum-n settings.
What We Found: Some Approaches To Reduce Testing Can Work
No matter the testing approach, ESSA requires state leaders to identify at least 5 percent of Title I schools for Comprehensive Support and Improvement (CSI), which is based on the performance of the whole school, not on specific student groups. The exact constellation of schools identified for CSI will vary slightly to considerably, depending on the particular test-reduction approach and minimum-n employed.
Except for the half-length test scenario, all our test-reduction approaches decrease the number of student groups within schools that meet the inclusion threshold for the “targeted” types of state support required by ESSA for schools with low-scoring student groups. Most testing-reduction strategies demand tradeoffs in school and student group identifications. But we found that two strategies—testing every other year and testing with a half-length test—hold promise for ensuring support determinations similar to the ones current systems produce, especially if states are able to adjust other accountability design variables (principally, the minimum-n setting).
What It Means: Implications of Reducing K-12 Testing
This study adds important considerations to the broader national discussion on accountability system reform. It carries significant implications for testing-reduction initiatives. We proved our initial point that every major design decision demands substantial tradeoffs. Nevertheless, we found two designs—half-length tests and every-other-year testing—that could support reduced testing.
Importantly, implementing any of the testing reduction proposals we studied is not fully—or even mostly—a technical decision. It is a policy decision based on what’s important to those who are affected by the proposal. For example, testing every other year means that parents would get state assessment reports half as often as they do now. Some might find that tradeoff unacceptable, while others might say it could help states free up time and resources to support deeper learning approaches such as performance-based assessment in the grades without state tests.
We would have liked to evaluate a matrix-sampling approach, but we could not get item-level data from our state partners on a timeline that would have allowed us to complete the study on time. Matrix sampling is the model used on the National Assessment of Educational Progress (NAEP) to efficiently test a large amount of content and skills by distributing only a small subset of the test questions to each student. This approach produces comprehensive scores at the aggregate level (e.g., school, district) but may not provide scores at the student level that reflect the full content domain.
An important note about our study: We tested our potential test-reduction models using a standard ESSA-based accountability system. Several Center for Assessment experts have been urging our field to seriously rethink our current approach to school accountability. With even modest reforms of our accountability system, our standard for judging the efficacy of the various test-reduction models would change. But for the purposes of our study, we chose an appropriately conservative approach to evaluate the models: comparing their outcomes to the current accountability model.
We must attend to the potential unintended negative consequences of any test-reduction strategy. To be fair, we’d argue the baseline system must also be evaluated for its unintended negative consequences. All our reduced-testing scenarios—except for the half-length test—might cause responses in the educational system that we could not measure or control for in our study. For example, testing at select grades may cause schools to shift resources to those grades and away from non-tested grades.
Again, reducing the footprint of state testing is not a technical decision. But such decisions can and should be informed by the types of anlayses we conducted in this study. Although we find promise for the potential of reducing state testing, we recognize that policymakers, depending on their values and the values of their constituents, might weigh the tradeoffs differently than we do.