The Road from Defining and Measuring Alignment to Effectively Using it in the Design of Assessments and Assessment Systems
Alignment turns 25 this year. Happy Birthday, Alignment! Alignment burst onto the K-12 assessment landscape with the Improving America’s Schools Act (IASA) – the 1994 reauthorization of ESEA. Among other Title 1 requirements, IASA required states to develop “assessments aligned to challenging content and performance standards.”
As is the natural course in these matters, the law begat regulations requiring states to provide evidence of the alignment between their assessments and standards. Those regulations then led to criteria for alignment. Those criteria brought about protocols and measures for determining and classifying the degree of alignment. Then, those protocols and measures became codified in peer review requirements. And there we have it; a really bad episode of Schoolhouse Rock–How the offhand use of a common word in law becomes reified as a measurement standard.
The Transition of Alignment from Word to Action
Alignment as a concept was not new in 1994. There are references sprinkled throughout IASA that speak to the general principle of aligning policy and practice. Legal cases, such as Debra P. vs Turlington established the importance of aligning curriculum, instruction, and assessment when assessments were used as the basis for high-stakes decisions. Even within educational measurement, no more than a modest understanding of validity theory was needed to see the relationship between having “assessments aligned to challenging content and performance standards” and the concepts of construct underrepresentation and construct-irrelevant variance. Alignment–it’s a good thing!
Alignment, however, was not yet established as a measurement concept on its own. The 1999 joint AERA, APA, NCME Standards for Educational and Psychological Testing make no direct reference to alignment. Even the 2014 revision of the Joint Standards describe the use of the term alignment in a manner that ties the concept to its historical roots within IASA:
“Content-oriented evidence of validation is at the heart of the process in the educational arena known as alignment, which involves evaluating the correspondence between student learning standards and test content.”
Within our quaint little educational arena, however, alignment is most definitely a real thing. The seminal work of Norman Webb, Andrew Porter, and the focus of organizations such as Achieve established a solid foundation for alignment in the late 1990s and early 2000s. Peer review requirements under No Child Left Behind (NCLB) spawned a demand for external alignment studies. Ongoing efforts by Norman Webb and WebbAlign, Achieve, and edCount continue to define and refine our understanding of alignment; particularly in the context of complex learning standards and expectations for evidence of student performance that have become increasingly complex (e.g., the CCSS and NGSS).
Twenty-Five Years Later, Where Does Alignment Stand?
After 25 years, however, we have barely begun to address, let alone answer, some of the most fundamental questions about alignment. As Webb stated in his 2007 article, Issues Related to Judging the Alignment of Curriculum Standards and Assessments, “These issues center around the basic question of when an alignment is good enough.” Webb calls on the field to continuously evaluate and validate the rules that define acceptable levels of alignment.
With regard to his alignment criteria, he states, “these acceptable levels have been specified for primarily pragmatic reasons,” that are based on assumptions about the standards, design of the assessment, and intended uses of the assessment results. Because those underlying assumptions vary across contexts and change over time, judging the alignment between standards and assessments “requires a certain subjectivity and cannot be based solely on a clear set of objective rules,” and “this makes it critical that in any alignment analysis the underlying assumptions and how conclusions are reached must be made clear at the outset.”
As with most things related to educational measurement and assessment, the answers to questions of “good enough” are never absolute or unambiguous. The notion of good enough depends on the context. “Good enough” for deciding whether a student needs a little additional instruction on a particular instructional unit is likely not good enough for determining whether to award a high school diploma. “Good enough” for identifying curricular strengths and weaknesses at the school level is likely not good enough for determining student mastery of particular content standards.
For the sake of simplicity, consider this example in which alignment is expressed in terms of percentages. We have three assessments judged to be 65%, 75%, and 85% aligned with the NGSS.
- Is the alignment of any of these three assessments good enough?
- What inferences about student achievement would be supported by one of the assessments, but not by the others?
- What are the acceptable uses, if any, for the assessment that is 65% aligned?
Over the last 25 years, the processes for evaluating the alignment of an assessment to standards have been based primarily on expert judgment. The answers to questions such as those above about interpreting the impact of alignment require empirical analyses to inform those judgments.
When is Alignment Good Enough?
As Webb acknowledged in both his 1996 monograph and the 2007 article, the process of designing assessments and assessment systems will always involve trade offs. Those trade offs usually begin with considerations of time and money. Trade offs involving time alone can be further broken down into constraints related to the time available to develop the assessment, the time required to administer the assessment, and the time necessary to process and report results. Time and money inevitably will lead to trade offs in the depth and breadth of standards that can be measured on a particular assessment. Placing assessment specialists in a better position to answer questions about when an alignment is good enough will place policymakers in a better position to make decisions about trade offs.
Since 1994 and IASA, the theory of action underlying alignment has always been that improved alignment among state standards (content standards and performance expectations), state assessment, and curriculum, will support better instruction; and that better instruction will lead to improved student learning.
We have spent much of the first 25 years since IASA defining alignment, determining how to measure it reliably, and determining whether states had, in fact, “developed assessments aligned to challenging content and performance standards.” Through much of that time, we also had the advantage of a testing environment that was standardized, stable, and strictly defined by federal requirements.
We are now, however, entering a brave new world of K-12 assessment. If alignment is going to fulfill its promise, we must now place at least as much attention on answering the basic question about when an alignment is good enough.