The Reality of Innovation in Educational Assessment

Oct 22, 2019

Part 2 – Innovation in Educational Assessment is Messy and May Have a Different Goal Than Stakeholders Would Like

This post is the follow-up to my previous post discussing the realities of innovation in large-scale educational assessment. In Part 1, I defined innovation as a change that not only improved an existing process or product, but also was found to have solved a problem or meet a need and, therefore, was adopted and used; that is, it changed the way things were done in the field.  

Part 1 also provided a high-level overview of the interaction between innovation in educational assessment and policy changes over the last three decades, demonstrating that shifts in educational assessment policy may abruptly alter the assessment landscape, changing what is needed from an innovative educational assessment.

In this post, I take a closer look at the realities of developing and implementing a process or product that changes large-scale assessment in a significant and meaningful way; in other words, an innovation. We begin with the reality that innovation in any field is messy. The path from idea to invention to implementation to adoption to innovation is rarely straight and is often filled with obstacles and setbacks. To that fundamental reality of innovation, we add three realities of innovation in educational assessment:

  1. Both invention and innovation in educational assessment often must take place in an operational setting.
  2. Innovation in educational assessment is often complicated by the need to combine the processes of invention and innovation.
  3. Innovation in educational assessment is often complicated by a tendency in the field to fail to connect innovation in assessment with innovation and reform in curriculum and instruction.

We’re Building the Plane as We’re Flying It

The phrase we’re building the plane as we’re flying it has been used to describe major innovations in educational assessment and other areas of education reform since the 1990s. It refers to the common scenario in which a new initiative is being implemented before it has been fully vetted, stress tested, or – in some cases – fully designed. Even with educational assessments that appear to be ready for prime-time when they are implemented, it is not uncommon for the first operational administration to function as a pilot test of the new program. In such cases, first-year test scores are used for accountability, but significant changes are made to the design of the program between the first and second year, giving rise to the apt oxymoron operational pilot.

Once regarded as a call for caution, the phrase building the plane as we’re flying it is now a mantra, used as a call to action and a reflection of the urgent need for school improvement. Whether seen as a desirable or merely necessary state, the phrase conveys a sense of the risk or uncertainty associated with innovation in education and other fields.

The Importance of Understanding the Difference Between Invention and Innovation

The definition of innovation that we offered in the previous post focused on the application of novel ideas or inventions to improve an existing process or product.  Although invention and innovation can be seen as part of the same process, innovation becomes much more difficult if the invention or novel idea has not been close to fully developed before it is implemented. Apple, for example, tends to be sure that it has a fully-functioning iPhone, iPad, or iWatch before presenting those inventions to the public in an attempt to change our lives. Yes, there may be some glitches when the new products are introduced to the public, but those glitches better be minor.

We can revisit the Partnership for Assessment of Readiness for College and Careers (PARCC), Smarter Balanced, and the Race to the Top assessment program for an example of how this distinction between invention and innovation has played out in educational assessment.  

Smarter Balanced did not have to invent computer-adaptive testing (CAT), a feature component of its innovative assessment. CAT was well-established before Smarter Balanced attempted to apply it to large-scale state assessment. State assessment did come with some requirements and constraints that required working out adjustments to CAT algorithms but did not require inventing CAT.  Additionally, when the time came to implement the new assessment, Smarter Balanced states were able to select an assessment contractor with experience in CAT to administer their state assessment. With Smarter Balanced and other computer-based assessments implemented over the last five years, we have witnessed the major, program-ending problems that can arise when states select an assessment contractor with limited experience implementing large-scale computerized assessments.

A feature component of the PARCC assessments, in contrast, was the development of new item types designed to better measure the complex knowledge and skills required by the Common Core State Standards (CCSS). To a large extent, PARCC was starting from scratch in inventing these item types. In its early years, the PARCC consortium commissioned research studies to develop innovative item types. The PARCC states also worked with their assessment contractors to come up with new approaches to assess the CCSS standards. The consensus among PARCC participants I have spoken with is that they learned a lot about innovative item types over the course of the grant. Unfortunately, because the program was tasked with building an operational assessment at the same time that it was inventing new item types, it did not have the luxury of fully testing, evaluating, and perfecting those item types before it was time for the PARCC tests to be field-tested and become operational. The initial operational administrations of PARCC became, in essence, the testing ground for PARCC’s inventions.

Innovation in Educational Assessment is Usually A Small Part of a Larger Innovation

Virtually all innovative, large-scale educational assessment systems launched over the last three decades have been associated with an education reform effort that includes new content and achievement standards and requires significant changes in curriculum and instruction. In terms of the Efficiency – Validity continuum described in Part 1, these innovations in large-scale assessment invariably are intended to enhance validity, which leads to a harsh reality about innovation in large-scale educational assessment:

Innovation in large-scale educational assessment is not intended to make life easier.

We often associate the term innovation with products and solutions that make our lives not only better but easier. By their very nature, however, that is not the goal of most innovations in large-scale educational assessment systems. Yes, they are designed to meet a need and solve a problem, but that need is often associated with raising standards and disrupting the status quo in order to achieve a goal that has not yet been achieved – to improve instruction and learning for all students.  

It would be nice if innovation in large-scale assessment could produce the magic “diet pill” – the one-item test that provides an accurate measure and actionable information to inform instruction – but that is not the reality of innovation in large-scale educational assessment.