A bar is going through a ring and on the left side, it is colorful and bright, but on the right side, it is grey to indicate that things that look good may not always be good.

What Happens to Performance Assessment If We Use It for Accountability?

May 10, 2023

Yes, We Can Do It. But Should We?

High-stakes standardized tests are not popular. Performance assessment reforms, on the other hand, never fail to inspire conversations about the types of teaching and learning we value as a society and how much better such assessments would be for students and educational environments. 

However, I am concerned that federal requirements would overshadow or corrupt even the best-intentioned state performance assessments if they’re used for accountability. And I am not alone in this belief—research evidence from past state performance assessment programs supports this claim. 

Once the No Child Left Behind requirements were enacted in the early 2000s, for example, most of the state performance assessment programs died on the vine. Why? Because the federal requirements changed from school- to individual-level results, and the accountability pressure increased.

Accountability sets a very high bar for the technical quality of state-level assessments. It is because of the decisions made based on student state test scores both at the individual and aggregate levels (e.g., identifying a student as proficient/not proficient, identifying a school in need of supports and interventions, allowing a student to graduate from high school in some states) that there are strict federal peer review requirements related to validity and reliability (among other technical quality issues). 

This means that those who promote using performance assessments for accountability may want to stop and consider exactly what it is that they want to argue for. I’m not talking about adding a performance task to a state test blueprint—which is not uncommon. I’m talking about the idea of replacing state standardized tests with statewide performance assessments in the current federal policy context. My hope in this blog is to explain why I think that is problematic and to offer a couple of other options.

State Performance Assessment Collides With Federal Accountability 

Any assessment we use to meet federal requirements for annual accountability testing must meet very specific technical quality requirements. Those requirements are based on how the information will be used. The information from these assessments produces a student-level score, which is combined with other indicators and aggregated to hold schools accountable for student achievement, academic learning growth, and closing achievement gaps. 

If statewide performance assessment programs replace current state testing programs, they would have to meet the same technical quality requirements. This is true whether performance assessments replace state tests every year, every other year, or once per grade span. Here’s what would likely happen to the design, implementation, and scoring of performance assessments if they are used for accountability under current federal law:

Performance assessments will likely be

  • More standardized in design; 
  • Administered near the end of the year to maximize students’ opportunity to learn;
  • Audited for scoring quality if scored locally, to ensure the reliability of judgments (eventually, they’d likely be centrally scored or scored with automated methods); 
  • Less complex, with more contrived contexts or scenarios because student responses will be more limited (e.g., respond to this prompt in essay form, solve this math problem that only has one right answer, etc.); 
  • Constrained with very little (if any) student choice over the topic or way they demonstrate their knowledge and skills, which limits relevance to students’ cultural and social identities; and
  • Common: The same performance tasks will be used across the state.

So you see what’s happening here: State performance assessments will look and feel more like standardized tests if they’re used for accountability purposes. 

Reliability and Testing Time: Major Issues in Statewide Performance Assessment

Standardization isn’t the only major consideration when we imagine using performance assessments for accountability. There are well-known challenges with the reliability of student scores when using just a few well-designed performance assessments. We could get around this thorny issue if we didn’t need to produce individual student reports that are sent home to families, but that is not the current federal landscape. 

Reliability is fundamentally about consistency. It is not that human scorers or automated scoring systems can’t be trained to score student work from performance assessments in consistent ways and apply sufficiently consistent judgments. They can. The problem is that some students will perform well on one task while other students will perform well on a different task—and not necessarily because they know less (or more). It’s because a student’s performance on any one task doesn’t always generalize to his or her performance on all other tasks. And it’s important that test scores allow users to generalize from the specific test score to the larger domain (e.g., 5th-grade mathematics).

Time is another issue that arises when we consider using performance assessments for federal accountability. Research has shown that a lot of tasks are needed to get a stable estimate of individual student achievement—in the range of six to 12 tasks. That translates to a lot of time developing performance tasks and allotting school time to administer them. If the focus was on school-level accountability, however, we could get by with considerably fewer tasks per student—a point I’ll return to in the next section.

Current federal policy favors standardized tests because we need high levels of reliability to support the highly consequential decisions we make based on those tests. Policymakers, educators, and families must have faith that test results are reliable if big decisions like 3rd-grade promotion, or a school’s A-F rating, are based in large part on those results. And right now, both individual and aggregate uses of state test results are part of federal law. States must send home individual student reports to inform parents about their child’s achievement relative to grade-level standards. States must use student test scores as a key component of school accountability. 

Using Statewide Performance Assessments Under A Changed Federal Law

What are possible solutions to this dilemma for those who value performance-based assessment and also value monitoring school quality and achievement gaps through state-level testing? There are likely many different possible solutions—some more radical than others. 

One radical solution is to abolish accountability or profoundly change its policies. Here I err on the non-radical side. Below are two solutions that imagine a future where federal policy is adjusted to allow more innovation in state assessment. 

Stop requiring individual student results. This would allow a different approach to sampling what students have learned as a result of instruction. Right now, every student in each grade and content area receives an equivalent form of the same state test so we can say something reliable and comparable about each individual. But if we needed only to say something about the school, we could matrix-sample (give different tasks to students in the same school). If each student completed a small number of tasks, we’d be able to say something reliable and comparable about the school. These performance tasks might still need to be more standardized in design, implementation, and scoring than for classroom uses, but even so, many might see swapping bubble tests for performance tasks as a big improvement.

Stop testing every student every year. Administering standardized tests every other year (or maybe once per grade span) is enough to get the information we need about overall student performance to monitor schools’ performance trends and achievement gaps. By testing less, more focus and resources could be placed on promoting rich classroom activity systems for every student in every grade where teachers are implementing ambitious teaching and equitable assessment practices daily. Performance assessments would be one part of this broader assessment system—but used only for classroom purposes or local program evaluation purposes—not federal or state accountability. 

Performance assessments under this reduced-accountability framework can be

  • Locally designed (including co-designed with students) and more representative of community values;
  • Implemented within local curriculum scopes and sequences;
  • Scored by classrooms teachers;
  • Complex and authentic, with multiple solution pathways and real-world scenarios;
  • Open-ended, with student choice over the texts, topics, and ways to demonstrate their learning; 
  • Relevant to students’ cultural and social identities; and
  • Uncommon: Tasks don’t have to be the same in all classrooms.

Accountability Trumps Assessment Innovation

If reformers are serious about reducing state standardized testing or using performance assessments instead, they’ll have to make serious tradeoffs under current federal policy. Ambitious reform in state assessment is not possible without reform in accountability (see related blog posts here and here). 

Just as culture eats strategy for breakfast, accountability eats innovation for dinner. Simply trading performance assessments for standardized tests under current—or even slightly revised—federal policy would most likely constrain the design, implementation, and scoring of performance assessments in ways that reformers may not want. 

If reformers don’t want to compromise any of their ideals, they should pursue accountability reform as a first step.