All I Wanted for Christmas Was a Through Year Assessment System that Promotes Deeper Learning
Fulfilling the Promise of Performance-Based Assessment
In my recent post, I promised to reveal the type of through year assessment design I’d like to see. After providing some background and context, I provide a high-level description of my design for a through year assessment system that promotes deeper learning while alleviating many of the problems caused by current assessment designs.
My colleagues and I at the Center for Assessment have been a bit obsessed with through year assessment systems. Nathan Dadey and Brian Gong first wrote about the ESSA flexibility for using interim assessments for summative purposes back in 2017. We picked up our writing about through year systems in 2021, with several posts by Brian Gong, Will Lorié, and me, culminating with the Center’s November convening: Claims and Evidence for Through Year Assessments: What We Know and What We Need to Know.
While some might think otherwise, Center professionals are not reflexively opposed to through year assessments. Rather, we are concerned by what we perceive as strong assertions or claims about the multiple uses of through year systems without similarly strong empirical and conceptual evidence to support those assertions and claims. To be fair, this might be due to the need to rush through the design phase in order to meet tight implementation deadlines. In what follows, I attempt to follow the guidance about design thinking offered by Nathan Dadey in Session #2 of our convening in describing the rationale and goals for my model.
What’s the Problem?
The first step in designing any new initiative is to first identify the problems one is trying to solve. My main concern with the current end-of-year testing enterprise, in general, is that these systems have privileged breadth over depth. In other words, current systems are designed to meet alignment requirements—that might have been justifiable when first proposed—for tests to measure the full breadth and depth of the content and skills students are expected to learn. Except in reality, content coverage was almost always prioritized over measuring depth of thinking. Further, meeting these alignment requirements has led to tests based almost exclusively on selected-response item formats. Such tests signal the wrong kinds of instructional and assessment tasks we want to see employed by teachers in the classroom. My proposed through year system tries to turn this problem on its head by assessing the breadth of the content through the year while focusing on depth towards the end of the year.
Goals for the System
Once the designer (in this case, me) has outlined the problems with the current system, the next step involves clearly articulating the goals for the new system or initiative. These goals serve as the basis for the claims or assertions that can be supported by the test scores. The following goals follow a general order of priority, although I acknowledge a considerable overlap among many of the aims:
- Measure students’ ability to think deeply and reason critically.
- Signal the type of tasks students should experience in the classroom.
- Support the development of balanced systems of assessment.
- Support the implementation of high-quality curriculum.
- Support meaningful school accountability determinations without conflating learning and accountability goals.
I also recognize that the system must be practically feasible. Therefore, my recommendations are intended to support the overarching goals in the most efficient manner possible.
My Proposed Through Year System
My proposed system employs a fairly simple design. Again, I care most about having assessments that allow students to provide evidence of deep thinking by synthesizing the concepts and skills they learned throughout the year. Therefore, my system:
- is anchored by rich performance-based tasks administered toward the end of the year (depth),
- includes curriculum-based assessments tied to high-quality units of instruction to document students’ opportunity to learn the full set of content and skills (breadth), and
- provides opportunities to increase assessment literacy through the documentation of local assessment quality.
High-quality performance-based tasks have been shown effective for promoting and measuring complex thinking and problem-solving skills (see Marion & Buckley, 2016). I’m not blind to the obstacles that have limited the widespread use of such assessments, but assessment design is always a case of prioritizing certain goals at the expense of others. It is time to prioritize the deeper learning that a performance-based assessment system can support.
I envision each task requiring at least 45-60 minutes. To foster deep understanding, the performance tasks will be administered “toward the end of the school year” so that students will have experienced multiple and varied opportunities to learn and apply key concepts and skills. We know from years of generalizability studies that approximately 4-6 tasks are required (depending on lots of factors) to produce a stable estimate of student (or school) achievement. However, I do NOT expect each student to complete that many tasks.
If an innovative assessment system were accompanied by a transformed school accountability system that reduced the obsession with highly-reliable individual student scores, I could see administering just one or two rich performance tasks to each student using a matrix-sample design. Such an approach allows multiple tasks (e.g., 8-12) to be administered at a school while minimizing the testing burden on each student. I suggest exceeding the minimum of 4-6 tasks so the school can receive more information about school performance.
If there is a requirement/desire for student-level scores—which seems likely—I would still limit the number of tasks administered to each student to no more than three, given the time demands. In order to support adequate levels of student-level reliability, one option could include administering a short (e.g., 20 item) test that relies on selected-response items, likely using computer-adaptive testing (CAT).
There is a well-established research base supporting the efficacy of using high-quality curriculum for improving students’ learning opportunities. Thus, my system relies on allowing school districts to choose from a small handful of high-quality curriculum packages that provide students with opportunities to engage in the types of thinking called for on the performance-based assessments. Additionally, since the assessments are often the weak link in these curriculum packages, the state, in consultation with local school districts, would help create end-of-unit assessments that could be embedded in selected high-quality local curricula. This is similar to the approach Louisiana has used for its Innovative Assessment Program. At least some of these unit-based assessments should require demonstration of performance by students instead of relying solely on short-answer and selected-response questions to maximize system coherence.
Improving assessment literacy and documenting local assessment quality
Unit-based assessment scores would NOT be collected from local schools. Rather, school districts would be expected to document both the quality of local assessments and provide assurances that students received the opportunity to learn the full breadth of the content standards. One form of assurance is an “assessment map” that documents the opportunities all students have had to demonstrate the required grade-level knowledge and skills. This was a key aspect of New Hampshire’s innovative PACE system. In other words, the scores from the curriculum-based assessments would stay in the classroom to be used to support instruction, avoiding the negative consequences that result from the conflation of assessment scores for instructional and accountability purposes. Such an approach documents the “alignment” of the full assessment system and provides educators with a basic understanding of systems of assessment.
Additionally, states could use scoring audits that have been employed at various times in Kentucky, New Hampshire, and other states. I would support having districts submit samples of student work to the state to evaluate the quality and rigor of local scoring systems. The goal of such an audit would be to help local educators internalize the rigor associated with this system, as well as improve local scoring quality.
I realize the devil lies in the details of implementation. Further, I’ve been around long enough to understand that no system is perfect, especially one designed by a guy sitting alone at his computer. Any design activity must meaningfully involve a diverse and representative set of stakeholders.
I can already hear concerns that my system prioritizes depth and doesn’t measure the full breadth of the content standards. Yes, I definitely prioritize deep thinking because our current approach to curriculum and standards-based assessment, which Bill Schmidt aptly described as “a mile wide and an inch deep”, has not served students or teachers well. I recognize the opportunity-to-learn concerns associated with limiting the “state test” to just a few aspects of the domain, which I tried to ameliorate with my curriculum-based components.
My goal here was to present a high-level design of a through year system that meets the critical goals I outlined above. I know there are many technical and logistical details left unspecified, but for now, I am aiming high for a system that can enhance deeper learning for both students and teachers. I invite others to offer suggestions regarding my proposal or, even better, offer their own proposal for an assessment system that signals and supports the types of learning we hope to see.