The Reality Faced by Innovators of Educational Assessments
Part 1 – What Does Innovative Assessment Really Mean, and What Has Recent Assessment Innovation Looked Like?
The Innovative Assessment Demonstration Authority (IADA) provision of the Every Student Succeeds Act (ESSA) ostensibly offers states the flexibility needed to “establish, operate, and evaluate an innovative assessment system” with the goal of using that educational assessment to meet the ESSA academic assessment and statewide accountability system requirements.
In two recent CenterLine blog posts, Executive Director Scott Marion described the challenges faced by states attempting to take advantage of the flexibility offered by the IADA and offered a hopeful view of what innovative assessment might look like under the IADA constraints.
In this post, I take a step back from the particulars of the IADA and consider larger questions of innovation in state assessment:
- What do we mean by innovation and “innovative assessment system”?
- What is the goal of innovation in testing?
Innovation is a difficult concept to nail down, but most attempts at defining innovation include the following principles:
Innovation requires making a change to an existing product or process through the novel application of ideas or inventions with the goal of adding value to the product by making it more useful to its consumers.
Most importantly, it is widely accepted that innovation is more than just a good idea, a new way of doing things, or even a better way of doing things. The final principle highlighted above – making a product or process more useful to its consumers – means that a change must be accepted and adopted by consumers to be considered an innovation.
Innovation in Large-Scale Educational Assessment
In their 2018 article, Moncaleano and Russell trace the history of innovation in large-scale educational assessment (usually via advances in technology) in terms of the interplay between efficiency and validity. Ideally, it would be possible to maximize both efficiency and validity; but in practice, efforts to increase one often results in a decrease in the other.
Efforts to Increase Validity
In the 1990s, the push for increased validity via authentic assessment resulted in state assessments that included direct measures of writing, constructed-response items, portfolios, and performance assessments. Although offering the potential to increase validity, this round of innovation in assessment design significantly increased the cost of large-scale assessment, as well as the time needed to administer the tests, score student responses, and produce test results.
While the assessment community was still debating and sorting through those authentic innovations to educational assessment, the assessment and accountability demands of No Child Left Behind (NCLB) swung the innovation pendulum back toward efficiency. Nevertheless, the 1990s did produce some lasting innovations in large-scale educational assessment.
Advances in technology led to innovations in scanning and distributing images of students’ written responses. This action increased the efficiency of scoring and the ability to distribute those responses to scoring centers spread across the country. Efforts to further increase scoring efficiency via machine scoring of written student responses continue today.
Accounting for Every Student
Facing NCLB assessment and accountability requirements to produce individual student results annually for all students in grades 3 through 8, many states returned to tests composed primarily of multiple-choice, machine-scorable items. The problem to be solved in this era was the efficient testing and tracking of millions of students and the inclusion of virtually all students in the state assessment.
States developed student identification systems, which enabled them to ensure all students were included in the assessment system, assigned to the correct school for accountability, and classified by the appropriate subgroups for accountability. States also developed detailed accommodations policies and guides to ensure students with disabilities and English language learners had access to the assessment.
Those innovations in the ability to accurately identify and track students over time made it feasible for states to meet the assessment demands of NCLB and, in turn, gave birth to additional innovations in how we measure and report student performance (i.e., growth scores) and the frequency with which students are tested (i.e., large-scale interim assessment).
The Common Core, Race to the Top, and “a Tale of Two Innovations”
The NCLB era of assessment and accountability had barely begun when the pendulum began to swing again from efficiency back toward validity. Still four years away from NCLB’s 2014 deadline for all students to be proficient in reading and mathematics, states across the country agreed on a new set of standards, the Common Core State Standards (CCSS) that redefined what it meant for a student to be proficient.
Concurrent with the release and widespread state adoption of the CCSS, the federal government announced the Race to the Top Assessment Program (RTTT). Secretary of Education Arne Duncan heralded a new era of innovation in state assessment in announcing the Partnership for Assessment of Readiness for College and Careers (PARCC) and Smarter Balanced consortia as the winners of the RTTT assessment grants.
From 2009 through 2014, PARCC and Smarter Balanced designed and developed next-generation state assessments that went beyond the bubble test. The two consortia, however, took very different approaches to innovation in the design of their state summative assessments.
- PARCC focused on innovation in the content of the assessment; designing a core assessment that included a variety of new item types and tasks intended to effectively measure student performance on the depth of the CCSS.
- Smarter Balanced focused on innovation in the delivery of the assessment; designing a computer-adaptive test that could efficiently measure and classify student performance across the breadth of the CCSS; and they developed a system of accommodations, designated supports, and universal tools designed to enhance accessibility for all students.
Although these descriptions are admittedly simplistic and do not capture the full extent of the efforts of the two consortia, they do reflect fundamental differences between their flagship summative assessments.
As the RTTT assessment grants came to an end and it was time to implement PARCC and Smarter Balanced, a funny thing happened on the road to innovation.
Before PARCC and Smarter Balanced could complete their first operational administration in 2015, the innovation pendulum had begun to swing in a different direction once again. Large, time-consuming, top-down summative assessments administered at the end of the school year were out of fashion.
Both programs adapted and/or allowed states to adapt the assessment to meet their new needs. Smarter Balanced, however, was better positioned to function within the changing assessment environment and still serves a healthy number of states, anchored by California. The PARCC consortium, in contrast, has dissolved and the PARCC assessment introduced in 2015 has all but ceased to exist as a state assessment.
Returning to our principles for innovation, both PARCC and Smarter Balanced made a change to an existing product or process through the novel application of ideas or inventions with the goal of adding value to the product. However, as of fall 2019, only Smarter Balanced can claim to have met the final principle of innovation - meeting a need that made it more useful to consumers.
PARCC, therefore, may be a high-quality assessment, but it falls short of being an innovative assessment. That statement is not to say, however, that the PARCC consortium and the RTTT grant it received didn’t produce any innovations in testing. Over time, we may find that features such as particular item types developed by PARCC, or the PARCC approach to scoring and scaling combined reading/writing tasks, or the PARCC model for licensing items, meet the criteria to be considered true innovations in testing.
ESSA, IADA, and the Current Era of Innovation
At about the same time as the first PARCC and Smarter Balanced results were being released, the federal government passed the Every Student Succeeds Act (ESSA). States now had the option to use a series of interim assessments in place of a single summative assessment and to approve district use of a college admissions test in place of the high school state assessment. Plus, with the Innovative Assessment Demonstration Authority (IADA), the federal government once again appeared to be promoting innovative assessment; albeit without the half-billion dollars that accompanied the RTTT efforts.
Upon close inspection, however, it appears that innovation under ESSA is focused on the efficiency end of the Efficiency – Validity continuum. Innovation under ESSA is defined rather narrowly as finding a better way for states to perform the same task they are accomplishing with their current assessment program; that is, measuring student achievement of the state’s academic content standards.
An argument can be made that the ‘better way’ could also include enhancing validity through the inclusion of performance tasks, assessing speaking and listening standards, ensuring that students are more engaged in the assessment, and by providing the ever-elusive actionable information to teachers. Within the current context of large-scale assessment and accountability policy, however, it seems clear that the innovation landscape under ESSA is slanted toward efficiency.
ESSA and large-scale state assessment, however, is not the only vehicle for innovation in testing and educational measurement. Advances in the development of more complex measurement models, increased use of Big Data, and measurement of a much broader array of constructs may change what we measure and how we measure it. Advances in school- , classroom-, and student-based technologies may also reduce the need for state assessment.
There is no doubt that technology will continue to produce new inventions and new approaches to testing. It always has and it always will. As we move forward, it will be exciting to learn:
- which of those inventions will become innovations?
- will innovation push the field toward efficiency or validity? or
- will technology finally lead us to a solution that maximizes both efficiency and validity?
Perhaps the most important question is, will we control innovation in testing – or will innovation in testing control us?