Monitoring COVID Recovery Efforts

Educational Assessments are Simply Tools in Your Toolbox

“If the only tool you have is a hammer, it is tempting to treat everything as if it were a nail.”

-Abraham Maslow (circa 1966, as adapted from an even older British saying).

COVID has presented industries with new challenges or has amplified existing ones (e.g., supply chain issues, staff shortages, etc.). The education space is no different. As an industry and country, we are facing a realization that inequities in student access to educational opportunities may be growing, threatening to amplify longstanding challenges for students who are underserved. It is critical that we use the right tools to understand and overcome those challenges.

Taking Stock of Where We Are

In 2021-22, there are essentially three potential scenarios that may be playing out with respect to student learning. In broad terms, students will:

  1. Accelerate their learning and recover from instructional losses incurred during the pandemic.
  2. Decelerate their learning and exhibit greater impacts to learning due to compounded instructional losses. Others have written about this scenario and referred to the idea of an academic downward spiral (See my colleague Nathan’s blog here).
  3. Learn at a “normal” pace in the school year 2021-2022, where students are improving but not fully recovering from instructional losses in 2020 and 2021.

We need the right tools to determine whether, and to what extent, these scenarios will play out. Large-scale state assessments measuring the breadth, depth, and complexity of state standards are one tool that can be used for this purpose.

In the spring of 2021, states had the opportunity to use large-scale statewide assessments to examine the impact of COVID disruptions in order to establish a baseline for pandemic recovery. However, not all states and districts could leverage data that were representative demographically and reflected the full performance continuum. In some states, fewer than a quarter of students participated in large-scale state testing, which remains one of the few standardized instruments that can be used to make comparisons across a state to determine whether students are making progress with respect to state content standards. Standardization simply refers to efforts to eliminate differences in administration conditions so we can make comparisons using an instrument that tries to fairly measure students’ knowledge, skills, and abilities on state standards.

The loss or threat to state tests worried many states and advocacy groups who recognized state tests as being an important tool to enable users to make comparisons across schools and over time. It is clear that those states unable to use their state assessments effectively in 2021 will need additional tools to take stock of their efforts to accelerate learning. In reality, however, all states will find it difficult to evaluate their recovery efforts without a set of comprehensive tools.

Having the right tools will enable students and educators to monitor progress over time and corroborate whether that progress can be detected on large-scale assessments measuring the breadth, depth, and complexity of the standards.

Summative Assessment Provides a Broad Picture

Summative assessments provide users with key insights into broad-stroke student performance that can be used to evaluate system quality and corroborate local learning observations. However, state summative assessments were never intended to support finer-grained instructional decisions for students.

When we lost access to state testing in 2020, many districts and states turned to interim assessments as a substitute. Interim assessments, or those assessments that are administered periodically throughout the year, were assumed to provide instructional insight, support broad inferences about performance, and monitor COVID recovery. However, these assessments often replicated the same information as the end-of-year statewide test. Suddenly, it feels like all we have are lots of hammers when we need to dig deeper into our toolbox for more tools.

Leveraging Finer-Grained Assessment Information

Recently, I wrote about the importance of developing more nuanced data profiles of school quality that go beyond large-scale accountability. There, I proposed the idea of establishing a coherent data infrastructure that would help one build an evidence-based data story to support more informed decision-making. Furthermore, I argued that the big picture signal that federal accountability indicators send are important (e.g., students are making progress toward grade-level expectations, students are getting access to broader educational opportunities, students are being equipped to obtain the English language). However, you can’t make progress on the big indicators without having more frequent and finer-grained information to help you determine whether you are navigating the road to get there.

Similarly, large-scale summative assessments help inform the big picture but are not useful to guide ongoing instructional decisions. While one student might have a bad testing day and throw into question the value of their individual score, it is unlikely that under typical administration conditions (i.e., non-COVID conditions) that we would say the same thing about all the students testing in a school. But does that information tell you what to teach an individual student when they come back into a classroom next year? Of course not. It does, however, help you better understand ideas like:

  • With what standards are kids broadly struggling this year?
  • Where might we want to invest in curriculum or standards-based professional development next year for teachers?
  • Did our recent intervention or curriculum deliver the results that were promised?

You’ll notice that these are all big-picture, evaluative-type questions. When isolated, they have their value for researchers, policymakers, and administrators. But we have to turn to other coherent signals to help us monitor progress throughout the year.

What we often fail to recognize is that there is another, less talked about, role of summative testing: a means to corroborate small scale observations to help make local adjustments.

What do I Mean by Corroboration of Small-Scale Assessments?

Consider the following figure.

small scale assessments

In the figure above, the “L” refers to lesson plans, the “PM” refers to progress monitoring assessments, and “BOY”, “MOY”, and “EOY” refer to beginning-, middle-, and end-of-year assessments, respectively. As the grain size of an assessment increases, it will cover a larger set of standards.

I would wager that if you were to ask an educator what layer of assessment is the most valuable, they would say either the lesson-plan check-ins or curriculum embedded assessments that occur throughout a unit. Those assessments are closer to real formative assessment practices because they help an educator adjust their instruction in real-time – what many would call being instructionally responsive.

Because assessments developed by individual teachers or schools are based on their own interpretations of state standards or are purchased off-the-shelf, they are more difficult to evaluate as being aligned to state standards, and I do mean aligned in the broader sense (Standards for Educational and Psychological Testing, 2014). Here, alignment refers to the following ideas:

  • Match: The degree to which assessment items connect to standards.
  • Depth: The degree to which assessment items cover the cognitive complexity of the standards.
  • Breadth: The degree to which the set of items on an assessment (e.g., as operationalized in a blueprint or set of test specifications) cover the full range of the standards, consistent with the intended interpretation and use.
  • Performance Expectations: In addition to these three, I would also include the rigor of performance expectations. How well do you know that performance on a classroom assessment has the same rigor as performance on the state summative?

Classroom and curriculum-embedded assessments are perhaps the most powerful tool in an educator’s toolbox, but we need some confirmation that there is coherence between the small-scale measures and the signals we are getting with the large-scale measure. If complexity and performance expectations are coherent along the way, then progress on the small-scale means we’re making progress on the large-scale. If not, then our signals do not line up and we are left surprised that kids seem to be making progress on local classroom measures but still don’t show progress on the state ones. It’s like the idea of measure twice, cut once. But what if the calibration of a tape measure is off by a 1/4 inch? Then it doesn’t matter how many times you measure. Your cut will always be off.

Using Multiple Tools Wisely

While we don’t want to see everything as a nail, it can be tempting to overcorrect and use every tool in your toolbox. Afterall, more information is better right? Not necessarily. I would advocate for using multiple measures, but only as many as you need to feel confident in your decisions. Consider the idea of triangulation. Triangulation in research refers to using multiple measures and methods so that you can arrive at a more balanced explanation associated with your observation (Tanveer & Bashir, 2008). Triangulation isn’t just about having multiple data points that revolve around a single point-in-time. You can triangulate in many different ways: around the efficacy of an intervention, about how students are acquiring knowledge, or about how groups of students are learning over time.

Effective triangulation requires a coherent set of assessments where small grain-sized information can be used to build progress stories that lead to larger confirmations about how students are learning. In other words, do we have evidence that summative assessment gains corroborate local assessment improvement evidence? Part of answering that question is knowing what tools to use and knowing how to use them. By leveraging an evaluation of alignment using ideas of match, depth, breadth, and performance expectations, we can begin building a coherent set of tools to continue monitoring recovery efforts.