Stop Trying to Use Commercial Interim Tests for Instructional Purposes

A Case for “Strategic Abandonment”

At the Center’s recent Brian Gong Colloquium, Jill Abbott from Abbott Advisor Group brought up the concept of strategic abandonment. I was immediately intrigued. She used a quote by management consultant Peter Drucker to illustrate the concept: “All living things must have a functioning system of elimination … or they will soon perish.”

Jill said that strategic abandonment includes things organizations or people should stop doing, as well as things they shouldn’t start in the first place. These ideas have a lot of relevance for districts’ and schools’ use of commercial interim assessments for instruction. But before we go there, let’s take a step back, define terms, and clarify the commercial interim assessment context.

Defining Interim and Commercial Interim Assessment

Former and current colleagues at the Center, Marianne Perie, Scott Marion, and Brian Gong, coined and defined the term “interim assessment” in 2007:

Assessments administered during instruction to evaluate students’ knowledge and skills relative to a specific set of academic goals in order to inform policymaker or educator decisions at the classroom, school, or district level. The specific interim assessment designs are driven by the purposes and intended uses, but the results of any interim assessment must be reported in a manner allowing aggregation across students, occasions, or concepts (p. 5).

Commercial interim assessments are simply interim assessments that test companies or vendors sell to districts or schools. Common examples include NWEA MAP and Renaissance STAR.

Three Main Uses of Interim Assessment

Perie et al. identified three main uses of interims: predictive, instructional, and evaluative. The predictive use case was the most common one in the policy environment of Adequate Yearly Progress (AYP) targets under No Child Left Behind. Districts and schools devolved to educational triage behaviors and strategies that focused resources on “bubble kids”—those students close to the proficiency cut score on the state test and, therefore, the most likely to help the district or school reach its AYP targets.

More recently, however, the most common use case for commercial interim assessments has shifted to instructional. We can see this in a landmark law and legislative report from Oregon. Passed in 2022, House Bill 4124 required the Oregon Department of Education (ODE) to:

  • Create a survey to collect information from school districts about the kinds of academic tests they require students to take (and why) in grades Pre-K through 12, excluding statewide summative tests.
  • Analyze data collected via the survey.
  • Use the data to establish a set of recommendations and best practices that can inform policymakers and education leaders about testing in Oregon.
  • Report the findings to the legislature by May 1, 2024.

ODE created the survey, known as the District Assessment Inventory, and asked districts about the name and purpose of tests they require, the grades and student groups participating, the cost, who uses the test data and for what purpose, who creates them, and the length of time spent testing.

Shifts in Use of Interim Assessments

According to the executive summary of the survey results, “most required tests were reported to have multiple uses and purposes.” The top of that list includes all instructional uses (e.g., measure student progress; diagnose skills; plan lessons). The predictive or evaluative use cases are reported much less frequently. Additionally, 87 percent of responses said that teachers (not leaders) are supposed to use the data from the district-required tests, which also suggests instructional uses are being promoted.

From the ODE report we learn that many of the district-required assessments are commercially purchased. Chart 16 in the full report shows that districts spent on average about $8 per student last year, which equates to $4-5 million per grade level in K-8 and $1-2 million in high school on these commercial interim assessments—a $40-50 million annual investment!

And it is not just an investment of financial resources but also of instructional time. Students are spending an estimated average of about 15 hours per year on district-required tests in elementary schools, about 17 hours in middle school, and 18 hours in high school. And this doesn’t even include state-, school- or teacher-required tests. (For comparison, the report said that students in Oregon spent approximately 5-6 hours on state tests per year.)

The Relationship of Test Design and Use

One key takeaway from the ODE survey is that teachers and students are bearing the brunt of misunderstandings about the best use of commercial interim assessments, given their design.

As Scott Marion and I explain in our soon-to-be-published book, Understanding Instructionally Useful Assessment, interim assessments are typically designed in one of two ways. The first design is called “mini-summative” because it’s based on the same type of blueprint as the end-of-year state test (e.g., sample across the breadth and depth of grade 4 math standards). The second design is called “modular” because it is based on a subset of the content domain (e.g., fractions). Most commercial interims are mini-summative in design. Teachers may receive average scaled scores for their class or the percentage of their class scoring at proficiency or above.

This may raise a key question in your mind: How can a numerical score provide a teacher with instructionally useful information? I’m so glad you asked! In our book, Scott and I acknowledge that instructional usefulness exists on a continuum, but we are also clear on this point: If everything can be instructionally useful, then nothing is! We argue that

 “An instructionally useful assessment provides substantive insights about student learning strengths and needs relative to specific learning targets that can positively influence the interactions among the teacher, student, and the content….If the assessment doesn’t lead to changes in the interactions between students and teachers that improve student learning, we have difficulty considering the assessment, no matter what it does outside of the classroom, to be instructionally useful.”

Returning to Strategic Abandonment

This brings me back to where I began: strategic abandonment. If teachers need qualitative insights into student thinking so they know what to adjust in their instruction and why, then quantitative results from interim assessments that are not tied to specific curriculum and instruction will never be able to provide those substantive insights.

As we say in our book, “There is a difference between the data that teachers can interpret and use for instructional purposes and data that others within the educational system can interpret and use for evaluative or monitoring purposes…this is why we have different tests for different purposes and uses.”

That’s why districts and schools should strategically abandon the use of commercial interim assessments for instructional use cases.

Instead, if districts and schools need commercial interim assessments, they should focus on using them for evaluative purposes. In the Oregon study, this was the least commonly reported use. But it’s also the most promising.

One of the advantages of interim assessments is that, by definition, “the results of any interim assessment must be reported in a manner allowing aggregation across students, occasions, or concepts,” according to Perie and colleagues. Information that allows district and school leaders to monitor and track the effectiveness of intervention programs, curriculum implementation, professional learning, resource allocation investments and the like is a key piece of school improvement efforts. Interim assessments can provide one source of evidence for these program-evaluation purposes and would do so in a way that fits the design of the assessment.

Strategically abandoning commercial interim assessments for instructional uses and strategically selecting commercial interim assessments for program-evaluation uses would be a trade that would benefit teachers and students who are currently bearing the brunt of these incoherent practices.