Discovering the Need for Clarity on the Use and Effectiveness of Interim Assessments
This is the sixth in a series of CenterLine posts by our 2019 summer interns and their Center mentors based on their project and the assessment and accountability issues they addressed this summer. Calvary Diggs from the University of Minnesota worked with Nathan Dadey on a literature review on the use and effectiveness of interim assessments.
I was fortunate enough to intern at the Center for Assessment this summer. My project tasked me with the lofty goal of finding and dissecting the published evidence regarding interim assessment use.
Before I even got started with the project – a systematic literature review – I had to say something. It was a single question, posed in different forms in my application, during my interview, and again during my first meeting with my mentor Nathan Dadey:
Are you sure you don’t mean formative assessment?
It’s a question that reflects a contention between research, marketing, and educational practice (see Troy, 2011). It boils down to a simple critical observation: our terms aren’t very well defined. I know, I know: when are they ever?
What Do We Mean When We Say Interim Assessment?
My education is in school psychology – as opposed to educational measurement or statistics – so I’m familiar with the confusion of terms. PBIS, MTSS, RTI, and even intervention are acronyms and phrases whose meanings seem to have district and regional mutations. So, I spent time learning about Perie, Marion, and Gong’s (2009) conceptualization of interim assessments.
In our research synthesis, we ultimately used Perie et al. to build our inclusion criteria for what makes an assessment an interim assessment:
A week and a half into this research, I was developing our initial coding scheme as well as refining our search terms and inclusion criteria. I felt confident in my understanding of our framing, but there’s something about working at the Center – even as an intern. You’re constantly pushed to think about things differently, especially when you’re least expecting it.
An Assessment’s Use Versus Its Name
In this particular situation, I and the other Center folks were walking back from lunch with the summer interns and psychometric staff from Measured Progress (now Cognia). There was a gentle rainfall, and Brian Gong started a conversation with me about interim assessments. While I dodged puddles, he asked me to describe what makes a formative assessment different than an interim assessment.
I gave my textbook answer.
He waited, and I said how I still hadn’t totally figured out the distinction between the two. What I knew as formative felt interim, and that bothered me.
Seamlessly, Brian guided us into a conversation about use, which made me think of Kane’s 2006 chapter on Validation in Educational Measurement and Assessment in Special and Inclusive Education by Salvia, Ysseldyke, and Bolt (2011). Some of my research is in reading comprehension, and good comprehenders (and that includes experts) have a way of integrating knowledge and comparing it and testing it with what they already know. Brian provided a lot of the scaffolding, but by the time we returned to the Center from downtown Dover, my mental representation for formative and interim assessments was leaps and bounds stronger. What had I learned to think about?
How an assessment is used is more important than its label.
Unlike formative and summative assessments, the label of “interim assessment” isn’t associated with its use; rather it is associated with the time at which the tests are administered. As Juan D’Brot and Erika Landl wrote in a recent post, “Interim assessments are tools that fall in between these two ends of the spectrum, which were designed to inform teaching and learning throughout a course of instruction. The key distinction between different interim assessments is how the results are intended to be used in service to that goal.”
Interim assessments are, by intentional definition, representative of a variety of assessments and assessment systems, which is why interim assessments have three very broad categories of use: predictive, evaluative, and instructional.
A Deeper Understanding of the Uses Of Interim Assessments
Our project aimed to better understand the use of interim assessments and whether there was evidence supporting any particular uses.
Based on the initial 20 studies we were able to review this summer, we have identified the most common uses for interim assessments within each of the three broad use categories identified by Perie et al. Those uses are summarized in the table below. The labels and descriptions will likely change as the project progresses; however, this table reflects what they were at the conclusion of my work this summer.
Most Common Uses of Interim Assessments Within The Three Broad Use Categories: Instructional, Evaluative, Predictive
There is a still a lot to unpack, but for the purpose of this post, what should come through is that how we use interim assessments is important because use does vary – a lot. When intentions for use aren’t clear, we may end up with interim assessments with adequate psychometric properties (Diaz-Bilello, 2011; Underwood, 2010) that are adopted into systems unprepared to use them to support and inform existing comprehensive programs. As such, null or even negative instructional utility may be observed, as was the case with Konstantopoulos’s et al. (2016) randomized controlled trial and in Jones’ (2013) non experimental investigation of interim assessment use.
So, Where Does All This Information Leave Us?
Simply saying “changing instruction” lacks precision in both research and in practice, and clarifying what we mean – not just using the label – is an important next step. Perie et al. (2009), and indirectly Kane (as I’ve learned), suggest that it is important to
And finally, from Nathan I’ve learned it’s important to
(c) clearly identify if there are enough supports in place to facilitate using interim assessment results to make effective data-based decisions.
Diaz-Bilello, E. K. (2011). A validity study of interim assessments in an urban school district (Unpublished doctoral dissertation). University of Colorado at Boulder, CO.
Jones, K. D. (2013). The myth of benchmark testing: Isomorphic practices in Texas public school districts’ use of benchmark testing (Doctoral dissertation). Retrieved from Texas State University Library.
Konstantopoulos, S., Miller, S. R., van der Ploeg, A., & Li, W. (2016). Effects of interim assessments on student achievement: Evidence from a large-scale experiment. Journal of Research on Educational Effectiveness, 9, 188-208.
Underwood, M. (2010). The relationship of 10th-grade district progress monitoring assessment scores to Florida comprehensive assessment test scores in reading and mathematics for 2008-2009 (Unpublished doctoral dissertation). University of Central Florida, FL.