Implementing Innovative State Assessment

Houston, I Think We Might Have a Problem

One definition of insanity is doing the same thing over and over again, and yet expecting different results. Recently, I have been thinking about how this adage applies to innovative state assessment systems—those systems states are developing that offer new ways of measuring student proficiency for use in state accountability systems. I have concluded that one of the reasons that successful innovation in state assessment systems has been so rare is that we continue to ask states to implement innovative assessment systems before the state or the system is ready.

Since 2015, I have had the privilege of working with three states closely as they designed innovative state assessment systems, applied for federal approval, and, in some of those states, implemented their federally-approved assessment system, under the Innovative Assessment and Accountability Demonstration Authority (IADA; Section 1204 of the Every Student Succeeds Act). From this vantage point, I have witnessed first-hand the challenges that arise in the design and implementation of innovative assessment systems approved under federal law.

My colleague, Scott Marion, has explained many of these challenges related to meaningful innovation in educational assessment due to the technical, scaling, design, and policy obstacles (see posts here, here, and here). Similarly, Charlie DePascale previously discussed what innovation means in testing and how innovation is especially complicated when the novel idea or invention upon which the innovation is based has not yet been developed (see post here).

Building off of that idea, one aspect that perhaps needs more air time (pun intended) is the difficulty that states approved under IADA face in trying to simultaneously develop and implement an innovative assessment system: the equivalent of building an airplane while flying the airplane.

Flying the Plane While Still Building It

The whole “flying the plane while we are building it” concept might work for some things, and in some cases, such as maintaining instruction during the pandemic, it is absolutely necessary. This approach, however, has not been particularly effective in large-scale testing and innovative programs that aim to

improve instruction through the use of assessments more tightly embedded with curriculum and instruction,
require a significant amount of professional development for teachers, and
rely on non-standardized assessment formats for use in a school accountability system.

This isn’t a new conclusion. One of the Center’s founders, Rich Hill, a driving force behind one of the most innovative assessment state programs of the 1990s, the Kentucky Instructional Results Information System (KIRIS), wrote about the challenges involved in ensuring the quality of portfolio scores used for school accountability purposes. In particular, after the demise of what would still be considered an innovative state assessment system intended to change teaching practices, Rich reflected that the “groundwork had not been completed for most teachers in most content areas before the inception of the new accountability system,” which led him to suggest that “a great deal of preparation and training in advance of the desired change is desirable, and probably necessary” (see the paper here). Rich talks about how the original intent was to use high-quality performance assessment, but in the end, the assessment resembled a “traditional multiple-choice test with a few open-response questions thrown in” (page 2).

And Kentucky isn’t the only example of challenges that face large-scale innovative assessment programs when reformers try to build and scale a high-quality, technically-defensible system that can be used for school accountability purposes while simultaneously attempting to support the professional learning needs of educators involved in the building, implementing, and scaling.

Tung & Stazesky (2010) wrote about similar issues related to large-scale performance assessment programs in accountability systems from the 1990s in Vermont, Rhode Island, Nebraska, New York, and Los Angeles Unified School District. Consequences of the build-while-flying approach in these contexts resulted in loss of political will, especially due to concerns around the technical quality of the program, and the eventual demise or capitulation of the state programs back into traditional standardized tests.

How can we perhaps stop the insanity of building the airplane while flying the airplane by using the lessons of the past to inform the current IADA era of innovative state assessment systems?

Prepare the Field

First, and perhaps foremost, states should consider the pre-application stage as crucial for building the professional capacity and infrastructure that can support the eventual design and implementation of the innovative assessment system. For example, if the state’s theory of action and innovative assessment system design will require teachers to design, administer, and/or score common performance assessments, is there a core of teachers trained within the state already to carry out these complex activities?

Or, if the state’s innovative system design assumes that teachers will use interim assessment data to adjust/inform their instruction, has the state piloted the interim assessments in classrooms and asked teachers the extent to which the assessment design supports the intended instructional use? These infrastructure-type building activities and stakeholder-input sessions prior to applying for IADA approval are not typical in practice.

Design First, Then Apply

Many states are still in the process of designing the actual assessments within the system and making critical decisions about how those assessments will be scaled together into a summative determination of student proficiency after applying and receiving innovative assessment demonstration authority approval from the federal government. This practice leads to another recommendation: states should not apply for IADA approval until after completing the assessment system design and, ideally, prototyping the process for at least a couple of grades and subject areas.

Piloting the innovative system in a representative sample of schools prior to submitting the application will enable the state to make informed decisions about how information from multiple assessments will be combined together into a summative determination in a technically-defensible manner—especially as many innovative assessment programs are collecting information at different times over the course of the year and yet making an end-of-year claim about student proficiency. Piloting will also provide the state with critical information about the level of resources that will be needed to scale up the innovative system statewide.

Piloting the innovative system on a sample of schools that are still participating in the traditional state assessment also helps the state establish a baseline for subsequent comparability evaluations and puts the state in a much better position to iterate and change the system design prior to making any promises to the educators, state policymakers, and the federal government.

States would then be in a better position to go ‘operational’ with their innovative system in the first year of IADA approval, rather than use the first year (or more) to conduct this design work while the IADA clock is ticking and all schools are still administering the traditional state assessment. Prototyping and iterative re-design cycles embedded within the pre-application stage allow the state to arguably design a more robust system based on early fails, misspecifications, and/or formative input from the field. Or said simply: prototyping offers the state a chance to work out the kinks and mechanical glitches before taking the plane to the runway to take off.

Stop the Plane, I Want to Get Off

Many of the five states who have received IADA approval to date skipped some or all of these steps prior to completing the application because they planned to work out the details in the first few years of implementation. The problem with such a plan is that (a) the IADA timeline is short (after 3 years there is an IES review and upon successful review, the program can continue for a couple more years) and (b) it relies on some assumptions we know to be highly unlikely given previous assessment system reform efforts:

educators across the state will have the professional knowledge and skills to support the valued outcomes of the assessment system reform when the time comes;
continuing to make multiple changes to the system after approval will not have a negative impact on buy-in among educators or the political will of policymakers; and
technical challenges will not result in design changes that undermine the original intent or belief in the technical quality of the innovative system.

Change the Current IADA Requirements & Process

While iteration and prototyping with any innovation design thinking may seem like a ‘no brainer’, these processes are not commonly practiced in innovative state assessment design. This is likely the result of many factors, not the least of which is that states must fund these activities out of already-constrained budgets and these activities are extensive and financially costly.

However, if we don’t want to end up back in the same plane we started in—that is, back to traditional standardized tests—it seems wise to remind ourselves of the lessons learned from Kentucky and other innovative state assessment programs of the 1990s. One key lesson is that innovation requires extensive preparation, including capacity-building, infrastructure development, and prototyping.

And one way to instantiate these lessons may be outside of the current IADA requirements. For example, imagine the benefits of a development process funded similarly to previous federally-funded assessment reform efforts such as PARCC and Smarter Balanced where the main emphasis now would be on research and development related to innovative assessment systems.

The theories of action, development processes and procedures, rapid iteration and prototyping, technical manuals, and piloting could all be conducted with the intended purpose of proving that the concept could scale in a technically-sound manner – all without the pressure of also producing operational test results for accountability. Interested states that want to use these fully-developed and piloted innovative assessment system models could then focus their efforts on doing what’s needed to implement them at scale. This type of process meets the intended purpose of the current IADA – to fund research and development for the collective good and then let other states benefit from the knowledge gained – without requiring interested states to try to build the innovative plane while flying it.