But Will It Work in Practice?
Practical Challenges to the Use of Through-Year Assessment for Accountability
The interest in using through-year assessment systems in place of traditional end-of-year state summative assessments has been increasing rapidly and was the subject of a convening hosted by the Center for Assessment in November 2021. With the justifiable focus on validity and other technical issues, many of the practical concerns associated with the use of through-year assessment systems seem to be flying under the radar. We contend that even if the technical challenges could be solved, the practical hurdles associated with implementation at the state level could trip up the best of through-year intentions.
We discuss the following practical challenges that worry us the most:
- Curriculum sequencing
- Missing data/students
- Interrupting instruction
For each of these issues, we briefly describe the problem, what we are hearing as potential solutions, and our remaining concerns.
If assessments are administered throughout the year, purportedly to support instructional decision-making, then it is critical that students have had the opportunity to learn the tested content by the time it is tested. However, state leaders generally do not know when students have had the opportunity to learn the various content standards. Many states provide suggested curriculum maps and other instructional resources to support implementation of state content standards; nevertheless, the decision over what to teach – and when – is ultimately controlled locally.
Some states have proposed addressing this challenge by offering flexibility in how and when assessment modules are administered or by relying on “mini-summative” designs that represent the full end-of-year test blueprint. Each of these proposed fixes creates additional technical and validity challenges.
Flexible administration of assessment modules is critical for instructional purposes, but when more flexibility is introduced into the administration sequence for summative assessments, it becomes all the more critical to address comparability challenges to support accountability uses.
“Mini-summative” solutions might address the comparability issue; however, we do not see how such designs can realistically meet instructional uses or, at least, not in a more meaningful way than assessments explicitly designed to support instruction.
Missing Data/Missing Students
Anybody who has ever served as a state testing director knows full well how many different (and sometimes bizarre) circumstances arise that cause students to miss or have to make up tests. And that’s just for tests administered during one window at the end of the year. It is well-known that many students who start the year in one school are not enrolled in that same school by the end of year. This challenge is exacerbated in communities with high student mobility. Regardless, schools are still responsible for assessing all students, an increasingly challenging feat if multiple assessment administrations are required to produce a total score.
Through-year designs increase the challenge of assessing all students because students need to participate in more testing occasions to be considered a “participant”. Therefore, state assessment leaders must find a way to deal with variable student participation patterns. Based on the nature of through-year designs, participation rates will likely be lower than with current end-of-year tests, which has direct implications for school accountability systems that require 95% participation.
Most commonly, states and assessment companies have proposed using the through-year components as priors in what we call a “delayed multistage adaptive test.” In this case, the tests administered throughout the year are used to help place students on the correct end-of-year test module. This is an untried method and faces several hurdles. As we noted, many students may not complete all or even any priors. These students then have to take a longer end-of-year test, which creates problems during test administration because school personnel have to manage students taking different length tests.
Further, we are concerned about fairness issues for students who do not progress through the learning targets in a conventional way. Such students could get routed into an end-of-year test that makes it hard for them to access the full score distribution. States have tried to address this scenario by having the end-of-year test cover a broader range of content and difficulty than would otherwise be required in such an adaptive design; again, leading to longer state testing than would be the case with a single end-of-year assessment – defeating one of the main aims of this enterprise, i.e., reducing testing time.
It might seem odd for us to suggest that an assessment system designed to support instruction could end up working against this intended use. However, when the through-year assessment events are considered part of the accountability process, they require a level of security not typically associated with tests designed to support instruction. Whereas true formative assessment is inseparable from instruction, and most interim assessments can be administered locally with minor interruption, assessments used for accountability require schools to implement strict administration protocols and they often have to “shut things down” during testing.
For example, to maintain a secure, efficient, and uninterrupted testing environment, many schools restrict the use of their network during testing, which disrupts instructional activities. Schools must also utilize all available computers for testing, which necessitates collecting student-loaned devices early to prepare them for testing. Doing so can disrupt student access to devices for instructional activities in and out of school.
There are also a host of ancillary activities such as training, scheduling, system checks, and rostering that are associated with high-stakes tests. States would need to conduct monitoring activities, remotely and in-person, to support the integrity of the administration. Taken together, these activities are time-consuming and burdensome, which further erodes capacity to support instruction.
Further, we have found that when district and school personnel associate the through-year or interim assessments with the end-of-year test, it is hard to escape the feeling that accountability is lurking even when it might not be. This will likely change behavior and could actively work against the instructional aims of the through-year design.
States might deal with these interruptions by attempting to reduce the accountability uses of the purported instructional components, but given the costs associated with item development, we do not see how states can avoid requiring strict test security protocols.
Center for Assessment experts have pointed out the considerable validity and other technical challenges with through-year assessment systems. However, we do not think that the practical issues associated with such designs have received enough attention. Even the most technically-sound designs will fail if they cannot be practically implemented or if they do not garner educator buy-in.
We limited our discussion above to our main concerns. However, we are also worried about increased item development costs and the logistics of developing test items to fit within various through-year modules. Further, states are regularly trying to improve their test score reporting to make the results more useful to education stakeholders. Having to report results multiple times each year in ways that can support appropriate interpretations and meaningful actions adds a new layer of complexity. No doubt there are many other important issues to address.
We encourage those proposing through-year designs to create well-articulated theories of action to ensure both technical and practical issues are attended to in design and monitored during implementation. For example, a theory of action might describe how the designers intend for the through-year assessments to serve instructional purposes, but we cannot lose sight that these tests are ultimately summative. Designers must account for these practical and technical challenges in their theories of action.
Allison Timberlake, Ph.D. is the Deputy Superintendent for Assessment and Accountability at the Georgia Department of Education