Finding the balance in balanced assessment systems

Is there a Recipe for Balanced Assessment Systems? 

Are Interim Assessments a Required Ingredient?

Let’s talk about recipes. Then we’ll move on to balanced assessment systems, what they are, and whether interim assessments qualify as a necessary piece of them. 

A Desire for an Easy-to-Follow Recipe

It’s not a secret to family and close friends — I like to bake. My interest in baking was a pre-pandemic discovery spurred by my son’s interest in cooking. I used to think baking was alchemical magic, but I’ve come to appreciate the mixture of science and art baking requires. The first thing I learned about the science of baking is that to achieve baking magic you have to have a recipe – a thoughtfully developed, fully tested, and carefully executed recipe. You cannot simply throw flour, sugar, and a few of your favorite spices together and hope for the best. As a novice baker, I would also seek out recipes that were easy to follow.

As I continued baking, I discovered the wonders of substitution. You don’t have a particular set of spices? There are substitutes. Missing (or not wanting) eggs? Substitute soaked chia seeds. Can’t eat typical flour because you’ve developed allergies to most grains (like me)? Substitute alternative flours that can be used for (mostly) the same purpose.  

I raise this example for an important reason. As you are designing a balanced assessment system, it’s essential that you begin with a recipe that is thoughtfully developed and fully tested, and one that you can implement with fidelity.  But it is just important that you are open to and understand substitution.

When you’ve designed the assessment system you would like to bake (sorry, implement), some elements may not be available. You may have to create some assessment tools on your own because they don’t exist. Are there issues with resources, politics, or other contextual constraints that prohibit you from using an essential assessment tool? Out of necessity, you may have to substitute alternative assessments that can be used for (mostly) the same purpose. 

I urge you to read the remainder of this post in light of the context of those who are making choices to leverage (or not leverage) things like interim, diagnostic, or benchmark assessments. 

Considering Balance in Balanced Assessment Systems

Let’s think about the idea of choices when considering our recipe for balanced assessment systems. Balance in assessment systems requires understanding what role each ingredient (i.e., assessment) plays in the system and knowing which ones are critical, which ones are bonus (i.e., optional), and which ones sit somewhere in the middle.  

Compare the idea of creating a balanced assessment system to baking. Many may try to sell you their ready-made, pre-packaged assessment system or argue that to create balance (or a terrific “cake”), you need to use their easy-to-follow recipe: 

Add one part summative, three parts interim, benchmark to taste, and fold in plenty of formative assessments to bake a balanced assessment system. 

Spoiler alert: There are no short cuts or scripted recipes that magically create a perfectly balanced assessment system every time.

The thoughtful and purposeful use of what’s available — or what needs to be created or procured — creates balance. In other words, assessments don’t create balance, use creates balance. 

I cannot end this section without noting that Formative Assessment does not refer to an assessment event or assessment instrument, but rather to formative assessment practices – “a planned, ongoing process used by all students and teachers during learning and teaching to elicit and use evidence of student learning to improve student understanding of intended disciplinary learning outcomes and support students to become self-directed learners (CCSSO, 2018, p.2).” Understanding this distinction is critical in thinking about formative assessment as a non-negotiable ingredient in the recipe for balanced assessment systems. 

Recalling What We Mean by Balanced

As a reminder, balanced assessment systems must meet several criteria (see A Tricky Balance): 

  1. Coherence: The learning model that connects the various assessments in the system and to curriculum and instruction.
  2. Comprehensiveness: Assessments provide multiple views of what students know and can do, as well as support multiple users and uses.
  3. Continuity: The system documents student progress over time,
  4. Efficiency: The minimum number of pieces in place to reach intended outcomes without extraneous components. 
  5. Utility: Are they actually useful? 

Are Interim Assessments a Critical Ingredient in a Balanced Assessment Recipe?  

Scott Marion said no in this post, at least not in the way that flour is required for most recipes.  Formative assessment practices and other high-quality classroom assessments are the flour. Interim assessments might be more like vanilla or cloves. Those are great additions if you know how to use them, but overdo or misuse and your muffins are wrecked. 

I also question the value of interim assessments to improve student learning without a focus on the use of well-aligned, coherent, and feedback-oriented assessments. A case in point: 

The Council for Chief State School Officers (CCSSO) has recently hosted its annual National Conference on Student Assessment (NCSA). During this conference, we heard a lot about the importance of balance in assessment systems and the role of interim and benchmark assessments in “monitoring student learning” (note, the naming of the assessment [e.g., interim, benchmark] is less important than how it’s used). 

While valuable, I worry that we are co-opting the more important task of improving student learning (through better instruction and feedback) with monitoring student learning. 

So again, what is the actual value of interim assessments if they are intended to not just monitor, but also improve student learning? 

Posing that question is not to say that I don’t see potential value in the use of interim assessments. There are others at the Center (like Scott) who are more pessimistic about the role (or lack thereof) of interim assessments in a balanced assessment system. I acknowledge that the evidence linking interim assessment use and improvements in large-scale performance is non-existent in the aggregate (see Dadey & Diggs, 2020).

The (Potential but) Limited Role of Interims

My own personal experience in West Virginia suggested that interim assessments (in this case, benchmarks) did lead to minor gains on summative assessments. (see White, L., Hixson, N., D’Brot, J., & Perdue, J. (2010); the associated brief presentation; White L., Hixon, N., & D’Brot, J. (2010)). As you’ll see in the presentation link, commercial providers are quick to capitalize on these findings to promote the use of their products. 

However, the findings beguile the potential user by downplaying the context. Our focus was on using interim assessment as a means for corroborating the quality and efficacy of formative assessment practices because the interims were coherently linked to summative expectations, not because we thought that interims themselves provided any real insight into instructional next steps. They just confirmed whether the instructional approach worked. It’s an uncommon use, but one that was supported by intentional modeling, professional development, and collaborative learning. 

It’s unlikely that interim assessments inform instruction in the ways we think of most frequently.  My colleague Carla Evans, in a recent post, discussed the way in which large-scale assessment might be used to indirectly inform instruction. She also presents a few potential, but improbable, applications of large-scale assessment to more directly inform instruction. 

Interims suffer from the same fate. They do not have direct instructional utility in the way that high-quality classroom assessments do. In fact, interim assessments are evaluative in nature and they are often most appropriately used to determine whether students have mastered a broad set of content or for tracking progress over time for different student groups.  

If you really expect interims provide valuable information and there is a plan to leverage them, then I suggest you minimally consider the following uses as the most likely to be useful:

  1. Corroboration: Does performance on interim assessments match up with my expectations based on my instruction and formative assessment practices? 
  2. Understanding Item Design: Can I examine the items on the interim assessment (and student performance on them) and identify strategies for improving my own test development? 
  3. Use of Feedback: Can I use feedback from the interim assessment to help students understand and own their learning? 

As you can imagine, you cannot assume that by using an interim assessment, all of these uses will automatically come to be. These uses require very thoughtful design, a comprehensive professional learning plan, and a significant investment in infrastructure to administer and report. In a recent post, a colleague and I discuss three bare minimum requirements for formal assessment events (e.g., interim, benchmark, diagnostic) to be useful and part of a balanced assessment system: 

  1. Focus on the standards: Are standards sufficiently covered in terms of match, depth, breadth, and performance expectations? (see my recent post) It’s not just about coverage but also about matching rigor or complexity as expected (over time) in the standards.
  2. Focus on expectations: Are performance and complexity expectations similar to those on the statewide summative assessment? Is the assessment event meant to help identify opportunities for scaffolding, or is it expected to reflect mastery of a given standard? Performance expectations matter. 
  3. Focus on coverage: Assessment events, whether formal or informal, have a targeted span of content. The narrower the focus, the deeper you can target certain standards. Conversely, the wider the set of standards, the shallower the evaluation of knowledge. Most interims and benchmarks are intended to cover a wide set of standards. Understand whether locally-developed or commercially-purchased assessments are covering the lesson, the unit, the semester, or the year and whether they are giving you worthwhile information. 

These are three critical areas of evaluation that can help you evaluate the risk associated with using interim (or benchmark or diagnostic) assessments in lieu of formative assessment practices. 

Final Thoughts

I’m not saying that it’s impossible for interim assessments to be useful in informing instruction. I’m just saying we really haven’t seen it happen systematically yet. If you know of any documented examples of interim assessments being used effectively to inform instruction, please reach out and share them with me. 

If you were to ask me what the most promising aspect of interim assessments might be, I would advocate for the role of corroboration. The grain size of interim assessment results is too large to be instructionally useful, but it can help confirm or disconfirm whether instructional efforts were on target over a particular set of time. 

However, corroboration will only work if the interim assessment is aligned to the state’s standards, aligned to the state’s performance expectations (as reflected in the standards and the large-scale assessment), and when results can be interpreted by educators and students in light of what curriculum has or has not been covered. Corroboration also requires a very savvy user base. And that requires a significant investment in assessment and data literacy learning opportunities, which are dependent on good assessment literacy programs, resources, and structures to deliver them. 

Interim assessments might be able to support more effective teaching and learning if used carefully and appropriately, which could justify their investment as part of an assessment system. Ultimately, however, interim assessments are NOT required to balance an assessment system.