Evaluating the Impact of Statewide School Accountability Systems

5 key questions to ask when gauging accountability’s effectiveness

It’s time to ask a critical question: What evidence do we have that the school accountability systems developed under the Every Student Succeeds Act are working as intended? Adjusting for pandemic disruptions, states now have more than four years of data to examine. That creates an important opportunity, and it’s especially timely because the U.S. Department of Education has signaled renewed interest in state flexibility and waivers under ESSA. If states are considering changes to their accountability systems, those changes should be informed by evidence about what is and isn’t working.

Evaluating Accountability’s Impact on Schools

Evaluation is a broad topic. Multiple papers have been written about it, including this excellent resource by my colleague Juan D’Brot and others which provides a much fuller treatment of accountability evaluation than I offer in this post.

In this blog, I want to focus on one specific aspect of evaluation: impact. By impact, I mean the extent to which an accountability system is prompting the kinds of actions and behaviors that lead to school improvement. I propose five key questions that should guide an evaluation of accountability impact.

These are certainly not the only important questions. However, I believe these questions are especially important because they are central to the purposes ESSA accountability systems were designed to serve.

Five Key Evaluation Questions

1. Does the system identify the right schools for support and commendation?

A core feature of ESSA accountability systems is the identification of schools for Comprehensive Support and Improvement (CSI). States often identify additional categories, including one or more designed to recognize commendable performance. How can we ensure the “right” schools are identified?

This answer is part technical and part judgment. States can use technical tools (e.g., reliability calculations, classification consistency) to better understand the extent to which measurement or sampling error influences school classification. Such tools are often used to address both sensitivity and specificity. Sensitivity asks whether the system detects schools that are truly in need of support or worthy of commendation. Specificity asks whether the system avoids flagging schools that should not be included. For example, if analyses show that schools are highly likely to be misclassified for reasons not related to performance (e.g., n-size, grade-configuration), that has important implications for accountability design.

This question also requires judgment. Leaders may examine whether the schools identified for support, recognition, or commendation reflect the state’s goals and policy priorities. For example, how are these schools progressing in relation to the state’s long-term goals and measures of interim progress?

Ideally, states will develop clear descriptors for what each rating means, often referred to as school performance level descriptors (PLD). Then, education leaders and other constituents can qualitatively review schools within each rating level to determine the extent to which the available evidence supports the intended meaning of the PLDs.

2. Do constituents understand and use results as intended?

An accountability system can satisfy federal requirements and still fail to prompt the intended actions. The system’s impact depends, in part, on whether constituents understand what the results mean and know how to use them.

The best way to evaluate this is to elicit direct feedback from a variety of users, such as education leaders, classroom teachers, families, community members and policymakers. Surveys, focus groups, interviews and structured conversations can provide evidence about whether the system is communicating clearly and supporting appropriate action.

When eliciting feedback from these groups, it’s important to ask the right questions. Feedback isn’t intended to produce a wish list of desired information and reports. Rather, the key questions might include: What did the accountability results help you understand? How did you use the information? What decisions did the results inform? What actions did you take? What was confusing, misleading, or not useful?

3. Does the system produce unintended negative consequences?

A strong evaluation should examine whether the system is encouraging behaviors that undermine the broader goals of education. For example: Are dropout rates increasing? Are schools discouraging enrollment of students who may lower performance results? Are important courses or career pathways being avoided because they are too rigorous or not central to the accountability calculation? Monitoring longitudinal trends in these and other areas, along with feedback from school and district leaders, can help identify potentially problematic patterns.

The point is to monitor for predictable risks and respond when evidence suggests those risks are becoming real and overshadow the positive intent.

4. Are the schools identified for support improving?

If accountability systems are designed to identify schools for support and improvement, then evaluation should examine whether and to what extent identified schools are improving at a rate beyond what might have been expected otherwise.

States can begin with straightforward qualitative and descriptive analyses. What supports and resources do identified schools receive, and to what extent are they implemented and used as intended? Are schools improving on the indicators that led to identification? Are they meeting exit criteria? How long does it take? Do schools that exit improvement status maintain improvement?

More sophisticated analyses can also help state leaders monitor the extent to which schools identified for support are demonstrating significant and sustained progress. One useful example comes from New Hampshire, which tracked the progress of CSI schools, comparing them to a set of control schools. Results showed that schools identified for support improved at a faster rate.

5. Is performance improving for all student groups?

A central purpose of ESSA accountability is to keep attention on the performance of all students, including student groups that have historically been underserved. Therefore, an evaluation of impact should examine whether gaps are closing and whether favorable school ratings or improvement trends are masking low performance for some groups of students.

This involves looking at patterns over time for both schools and groups. Are outcomes improving for students with disabilities, English learners, economically disadvantaged students, and students from different racial and ethnic groups? Are achievement gaps narrowing? How do academic growth rates in schools with favorable ratings compare to those of schools with low ratings?

Such analyses can help determine if the system is benefiting all groups, not just increasing average performance.

Principles for evaluating accountability impact

There are many ways to design and implement an accountability evaluation, but strong evaluations:

Center the theory of action. States should develop and refine a theory of action for the accountability system that addresses the system’s goals, the necessary conditions and resources, the actions to be taken, who is responsible for those actions, and the short- and long-term outcomes expected to result. A strong theory of action can guide both the design and evaluation of the system.
Emphasize technical quality. Information is only as good as the underlying data. It’s important to ensure that accountability data are trustworthy and useful for their intended purposes. This requires systematic quality assurance procedures, including both automated checks and human review.
Corroborate across sources of evidence. Using both quantitative data and qualitative evidence from multiple sources and over time will strengthen the findings of the evaluation.
Examine both intermediate and longer-term outcomes. Changes in factors such as planning, resource allocation, instructional focus, and support structures may influence outcomes over time. Good evaluations include both short and longer cycle reviews.
Support transparency. Be clear about methods and findings. Evaluation should not be a compliance or public relations exercise. It should help leaders understand the strengths and limitations of the system.

Inquiries Guided by Evidence

More than a decade after ESSA was passed, and following the pandemic pause, states now have enough experience and information to examine the efficacy of their statewide school accountability systems. It is time to look more closely at the extent to which these systems are working as intended: identifying the schools most in need of support, prompting productive action, avoiding harmful unintended consequences, and contributing to improved outcomes for all students.