When It Comes to Getting Summative Information from Interim Assessments, You Can’t Have Your Cake and Eat It Too
“You can’t have your cake and eat it too,” is a well-known idiom. In the case of educational measurement, it reflects the dilemma posed by a requirement for a single, summative score, and might read something like: “you can’t get summative scores for accountability purposes without the secure administration of carefully constructed forms in a defined window.”
However, the 2015 Every Student Succeeds Act (ESSA) attempted to provide one avenue to avoid the need for a single, secure test administration. Specifically, ESSA allowed for a “single summative score” to be produced based on “multiple statewide interim assessments.” In response to this, NCIEA Founder Brian Gong and I wrote a brief in 2017 that outlined key considerations for states examining this option. In essence, these considerations asked if the state can:
- Garner enough agreement from stakeholders about what should be assessed and when?
- Support the logistics of administering, scoring and reporting interim[1] assessments throughout the year?
- Produce summative scores from the interim assessments technically rigorous enough to support accountability uses?
The answer to these questions is no–at least for now. Currently, no state has opted to pursue the interim option. As we noted in the brief, this option requires a “sustained, multiyear effort that goes beyond what is currently involved in typical summative assessment programs” (Dadey & Gong, 2017).
This statement was not meant to discourage states from pursuing this option, as I think such an interim-based system could be both unique and worthwhile, but rather to illustrate the need for careful planning and implementation.
What the brief did not illustrate is how a state, working with districts and other partners, might use the design of an interim system to address the above considerations. Below, I explore one such novel design in which the assessed domain is divided up into many short, modular assessments. This exploration is not meant to be the final word on design of these types of interim assessments–instead, it’s meant to provoke thought and hopefully inspire others to consider ways in which novel design can tackle considerations like those outlined above.
Another Take on Implementing an Interim Assessment System
When writing the 2017 brief, I originally envisioned interims very much like what is typically found in commercial products–assessments that broadly measure a given content area, often digitally administered and sometimes vertically scaled. Or, perhaps, a more “modular” design, in which the content area is divided into specific “chunks” of content, each of which is measured by a different assessment, like the Smarter Balanced interim comprehensive assessments and assessment blocks.
There are, however, other alternatives. For example, one might imagine a large bank of item sets, in which each set of items is aligned to individual standards. Each item set wouldn’t be long–perhaps 10 or so items. A good way to think of these item sets is as “mini-assessments”, as they measure a single standard instead of a broader domain. These mini-assessments would be part of an online-test administration platform in which educators can select which mini-assessments to administer to students and when to do so. Further suppose that the platform allows users to flexibly group mini-assessments into larger assessments or give them individually. This design is actually the basis of Dadey, Tao and Keng (2018) and inspired this post.
The flexibility of this design is hard to implement well. It poses numerous challenges–from making sure the system isn’t “gamed” to ensuring that any designations based on the assessment results are comparable across students, teachers and schools. However, the flexibility of such a system could allow users to develop programs of assessment without sacrificing much sensitivity to curriculum and instruction, a hallmark critique of virtually all large-scale assessment systems. Such programs could include “recommended” groupings of mini-assessments that follow specific scopes and sequences of instruction. In addition, if the items within such a system could be easily viewed by teachers and were of high quality, then the system could be used for classroom purposes[2], although such visibility could conflict with state accountability purposes. I will address these accountability challenges in a forthcoming blog post. Stay tuned!
References
Campbell, D. T. (1979). Assessing the impact of planned social change. Evaluation and Program Planning. 2 (1), 67–90.
Dadey, N., Tao, S., & Keng, L. (2018, April). Developing Scale Scores and Cut Scores for On Demand Assessments of Individual Standards. Paper presented at the annual meeting of the National Council of Measurement in Education: New York, NY
Dadey, N., & Gong, B. (2017, April). Using interim assessments in place of summative assessments? Consideration of an ESSA option. Washington, DC: Council of Chief State School Officers.
Perie, M., Marion, S., & Gong, B. (2009). Moving toward a comprehensive assessment system: A framework for considering interim assessments. Educational Measurement: Issues and Practice, 28 (3), 5-13.
Way, W. D., Murphy, D., Powers, S., Keng, L. (2012). The case for performance-based tasks without equating. Paper presented at the annual meeting of the National Council of Measurement in Education: Vancouver, British Columbia.
[1] I rely on Perie, Marion & Gong’s (2009) definition of an interim assessment as “Assessments administered during instruction to evaluate students’ knowledge and skills relative to specific set of academic goals in order to inform policymaker or educator decisions at the classroom, school, or district level. The specific interim assessment designs are driven by the purposes and intended uses, but the results of any interim assessment must be reported in a manner allowing aggregation across students, occasions, or concepts” (p. 6).
[2] Although these purposes would need to be clearly articulated and supported with evidence as detailed in the The Standards for Educational and Psychological Testing (AERA, APA, & NCME, 2014). Also, I comment on this need for specificity in a prior blog post.