The Future of Statewide Summative Tests: Avoid False Dichotomies

Currently, three states have publicly announced their intention to pursue a waiver from federal testing requirements for the 2020-2021 academic year. In a few other states, there appears to be movement in that direction. It’s not clear at this time if these requests will be limited to a small group of states or if recent developments are the front edge of a big wave. In this post, the Center’s Associate Director Chris Domaleski expresses concern about framing the issue of administering state assessments in 2021 as a false dichotomy, leaving states with only two options: staying the course with current practices or abandoning statewide summative testing altogether.

Statewide summative testing was canceled in the spring of 2020, and some people couldn’t help but notice that the world didn’t end.

It’s true that a wide range of the educational experience was canceled in 2020, such as in-person classroom instruction, graduation ceremonies, and extracurricular activities. A key difference is that most everyone is rooting for these indispensable and basic practices to come back sooner than later. Standardized testing is clearly not missed nearly so much. Few lamented it’s absence in 2020, and some are calling for a continued reprieve.

Is it a good idea to resume statewide summative testing next spring? That’s not easy to answer. To start, there is much we do not know about the manner and duration of disruptions to the K-12 educational experience in 2020-2021. More broadly, earnest consideration of this question requires unpacking issues and assumptions about the nature and use of assessments. This essay was motivated by a specific worry that the current conversation about the future of testing often lacks context and nuance. In this post, I urge a more complete examination of the topic that recognizes a range of options for producing and reporting measures of student achievement from statewide summative tests in 2021.

The Concerns

What’s motivating calls for a testing reprieve next year?

The strongest critics may be using the current circumstance to push for a long-held ambition of discontinuing large-scale, standardized assessments. It’s unlikely that anything I’ll write here will influence these views.

Another motivation may be due to the conflation of assessment and accountability. This is understandable given that assessment results prominently figure into student and school accountability systems. I agree with those skeptical that status quo accountability can or should be continued in 2021, as I’ve detailed in a previous post.

Others may argue that the issue of testing primarily comes down to a calculus that the costs of testing don’t justify the benefits.

With respect to costs, proponents of suspending testing contend that it is not a wise use of limited resources. This may refer to the literal cost of testing – the funds required to develop, administer, and score the test, which are thought to better serve other purposes. However, cost may also refer to the time required to prepare for and administer tests. Lost instructional time is always a concern, but it’s particularly worrisome when learning opportunities are already at a premium.

The benefits of testing are always a subject of debate. However, policymakers, educators, and a wide range of stakeholders generally rely on statewide summative tests to help answer questions such as:

  • To what extent did students meet the state’s expectations for academic performance?
  • Are students likely on-track to meet expectations at the next level (e.g. a subsequent grade, course, or post-secondary pursuit)?
  • What student groups, districts, and schools are most in need of support?
  • To what extent are programs and initiatives designed to support student success effective?

Some might think that the usefulness of this information will be limited if the opportunity to learn is diminished. On the other hand, many might argue that it has never been more important to collect information to understand and address inequities in education.

Note, I did NOT include information about using summative assessment results to inform instruction as a benefit, as the nature, timing, and ‘grain-size’ of the information from the summative test is not well-suited for these purposes. See Scott Marion’s piece on assessment for learning for more on this topic.

Can we thread the needle?

One can acknowledge that the concerns motivating calls for testing waivers have merit, while also wanting to preserve the benefits. I don’t think there is a single right answer to how states approach statewide summative testing next year and beyond, but I think a serious attempt to thread the needle should include a more complete evaluation of alternatives. Below, I’ll touch on some ideas I believe are directly tied to the concerns noted in the previous section.

Addressing Cost and Burden

Regarding the financial burden of assessment, it’s worth noting that most states may be able to reuse a previous assessment. For example, states may be able to recycle the test last administered in 2019. This approach is less costly than developing a new test for 2021. It should also reduce threats to comparability, which is vital if the primary intent is to evaluate trends or changes in performance from one point in time to another.

If reusing a test is not a feasible option, there are well-known approaches to reducing the burden of assessments by sampling content or sampling students. In fact, my colleagues Scott Marion, Chris Brandt, and I included this in our ‘pre-pandemic’ policy brief in which we offered recommendations for the next authorization of ESSA. Sampling content refers to reducing the length of an assessment for any one student by spreading out the items across multiple students. The technical term is “matrix sampling” and it essentially involves giving every student a part of the total test. By combining these parts for many students, we can recover assessment information at summary levels, such as for student groups, schools, or districts. There are many possible matrix sampling designs, some of which preserve some student-level information; others do not.

Another approach to reducing the burden is to sample students. This simply means relaxing the requirement to test every student, every year, in every content area. For example, students may take a content area test once in elementary grades, middle grades, and again in high school.

This is not unchartered territory. There is a strong precedent for the sampling of both content and students. After all, these methods are used by the well-regarded National Assessment of Educational Progress (NAEP). Moreover, sampling designs can be devised to produce information that supports high-priority use of test results such as monitoring the performance of student groups to identify equity concerns and evaluating year-to-year trends in student achievement.

Some of these designs may threaten the ability to produce credible measures of academic growth in 2021-22 and future years, but certainly less so than canceling the spring 2021 assessment outright. To the extent to which growth is an important part of the state’s theory of action (and I argue it should be), this option should be evaluated very carefully.

Addressing Threats to Misuse

Regardless of whether a state pursues one of the options in the previous section to constrain cost or burden, a concern that results could be misused may persist. This is understandable given the extent to which assessments factor into accountability decisions. There is a straightforward way to address this: decouple tests from accountability partially or fully.


One approach is to report results at a summary level but not use them to influence accountability decisions for students, teachers, or schools. To be clear, assessments and accountability are not one in the same. We can proceed with the former while suspending or changing the latter.

If a state is going forward with some form of school accountability in 2020-21, they can make adjustments to the system, such as by changing weights or performance expectations. In this context, performance expectations refer to those associated with the accountability system (e.g. the performance required of schools to be classified as meeting expectations), not to the categories that describe students’ performance(e.g. proficient or advanced).

Another approach is to limit the uses or consequences of assessments in accountability decisions. For example, a state may decide to suspend plans to give all schools an index score or a letter grade and only focus on identifying the schools that should enter or exit the category reserved for schools most urgently in need of support (e.g. Comprehensive Support and Improvement.) A wide range of accountability adjustments can be considered, which will be addressed in a forthcoming Center brief.

Regardless of whether or how assessment results are used in accountability, providing a range of guidance and support for appropriate interpretation and use is important.

Final Thoughts

There are no easy answers. In particular, I think it is unlikely there is a one-size-fits all solution for every state when it comes to summative testing. It may be the case that when we get to the end of 2020, it is clear that it will be impossible to administer state summative assessments in some states in spring 2021.

My primary objective in this essay is to bring additional ideas and alternatives into the discourse about whether and how to restart statewide, large-scale assessments in 2021. Education leaders should seriously consider if such efforts are preferable to eliminating the information on student achievement altogether, which could be used to help identify achievement gaps or direct limited school support resources.

The decision is too consequential to be framed as a false choice of ‘all-or-nothing.’