The Good, The Bad, and The Ugly of Innovation in Educational Assessment

The Good, The Bad, and The Ugly of Innovation in Educational Assessment

Sep 15, 2021

Thoughts on What Has Worked Well, What Has Been Less Successful, and Why

This post is based on Charlie DePascale’s presentation during Session 1 of the virtual 2021 Reidy Interactive Lecture Series. Charlie, a Senior Associate at the Center from 2002 through 2019, enjoys writing, daily walks, and engaging in a smattering of consulting on educational assessment and accountability. Here, he provides his thoughts on assessment innovation.

Although we may not have moved as far beyond the bubble test as Secretary Duncan envisioned in 2011, all should agree that there have been significant changes to educational assessment, in general, and large-scale testing, in particular, in the last decade. The changes to large-scale testing since I entered the field in 1989 are even more dramatic. The process of innovating in educational assessment, however, has not been a smooth one. In this post, I describe efforts to make improvements in eight critical aspects of large-scale testing and discuss reasons why some have been successful (the good), some have failed (the bad), and some continue to be quite messy (the ugly).

Success in the Areas In which We Have the Most Control

I think that there are four aspects of large-scale testing in which we have been quite successful in effecting change:

  • Scoring a wide variety of student responses 
  • Managing Data throughout the assessment process
  • Shifting from paper-based to computer-based testing
  • Applying the principles and techniques of Item Response Theory

Changes in these areas can be characterized by making the assessment experience more efficient for the assessment contractor, the state, or the end-users within schools and districts. Often, the changes in these areas have been the direct result of the application of advances in technology. Many of the changes, such as improvements to scoring constructed-response items or effectively using student identifiers to track testing materials, occur behind the scenes and are not directly observable by the end-user. The successful shift to computer-based testing, on the other hand, has directly affected the type of information that can be conveyed to students through test materials and how the student interacts with those materials.

Developing solutions to challenges in these areas has not been easy, but for the most part, these challenges can be classified as simple problems; that is, problems with a clear cause and effect that have a straightforward solution, even if that solution required a wait of five to ten years for technology to catch up.

Improving the Utility of Large-Scale Testing Has Been More Challenging

Despite our grand designs and hopes, efforts to use large-scale testing to improve student learning remain quite muddled and messy. Although certainly not an exhaustive list, I have identified three areas to represent our struggles to make large-scale testing more useful to educators:

  • The shift to criterion-referenced testing
  • Developing a coherent plan for high school testing
  • Making use of Item Response Theory to generate content-based interpretations of student performance

One might argue that the three issues listed here are much broader than those included in my list of areas where we have experienced success. That is certainly true, and that is one of the reasons why these areas are messy, and success has been more difficult to achieve. Unlike the simple problems described above, the issues listed here are better described as complex problems; that is, problems with multiple causes that may interact with each other and are not easily discernible.

The assessment field has certainly enjoyed some success on tasks within each of these areas, such as in developing end-of-course tests aligned to state content standards or in developing and implementing standard-setting procedures to identify achievement level cutscores. Solving complex problems, however, requires a deep understanding of the problem to be solved. In the three areas listed above, however, rather than a deep understanding of the problem leading to effective solutions, I argue that our efforts to date 

  • reflect a lack of agreement on what is needed or wanted
  • attempt to provide information that policymakers and educators want even if we do not understand it or we know that it is not yet supported, and 
  • demonstrate the disconnect between measurement and instruction

Efforts to Implement “Innovative” Assessment Programs Have Not Gone Well

Since I entered the field of large-scale state testing in 1989, I have been directly involved in or witnessed several attempts to implement so-called innovative assessment programs as part of an effort to improve instruction and student learning. Some programs have had limited success, but ultimately all collapsed for one reason or another. The programs that have been least successful are those in which implementing the assessment was the primary, or only, tool used to solve the problem. 

The failure – or limited success if you prefer – of innovative assessment programs should not be a surprise. The problems that these assessment programs are being asked to solve are neither simple nor simply complex. Rather, they are most often wicked problems. A wicked problem has been defined as “a problem that is difficult or impossible to solve because of incomplete, contradictory, and changing requirements that are often difficult to recognize.” Improving instruction and student learning within the landscape of the public education system in the United States certainly fits the definition of a wicked problem.

Another characteristic of wicked problems is that ill-timed solutions often not only do not solve the problem, but actually make it worse. In my experience, if there is one term that best describes the innovative assessment programs that have been implemented it is that they have been ill-timed. Used as a lever for change, often assessment programs have been implemented before

  • Educators are ready
  • Policymakers are ready
  • The assessment itself or the measurement field behind it is ready, or
  • Technology is ready.

The end result of these initiatives is typically frustration on the part of all stakeholders, a rejection of the assessment program, and increased cynicism toward the next proposed solution. 

Failure, or at Least Limited Success, Must Be an Option

As many of my colleagues have argued over the past decade, the solution is not to wait for the perfect assessment and the perfect conditions under which to implement it. Those will never occur. The solution, however, does require creating an environment that acknowledges and is flexible enough to accept the following:

  • Innovation is an iterative process – whatever the problem.
  • Wicked problems will always require solutions to be fine-tuned, adjusted, or revised over time.

That is not to suggest, of course, that we should accept administering assessments with test items that are poorly constructed, using networks that do not have sufficient bandwidth, or without robust field testing of the test items and the assessment. None of those actions are acceptable. 

However, if you have worked with stakeholders to identify and define the problem and are convinced that portfolios, performance tasks, through year assessments, or other classroom-based assessments must be part of the assessment solution, then all parties must be prepared for the long haul – the iterative, and likely messy, process that will be required to implement those assessments.