The Next Generation of State Assessment and Accountability Has Already Started

Feb 19, 2020

Part 1: What Will Be the Next Dominant Pattern in State Assessment and Accountability, and What Might Cause It?

This is the first in a three-part series on the future of large-scale state assessment and accountability. Of course, it is impossible to know the future, but forecasts for educational assessment can be informed by examining what has shaped state assessment and accountability in the past. In this post, I look at the role played by emerging operational capacities and the desire for efficiency – specifically computer-based assessment.  

A prediction was put forth more than 30 years ago about what might be expected for “computerized measurement.” Victor Bunderson and colleagues predicted “four generations” of computer-based assessment marked by expanded functionality, deepened expertise, and increasing individuality:

  1. Computerized testing—administering conventional tests by computer;
  2. Computerized adaptive testing—tailoring the difficulty or contents of the next piece presented or an aspect of the timing of the next item on the basis of examinees’ responses;
  3. Continuous measurement—using calibrated measures embedded in a curriculum to continuously and unobtrusively estimate dynamic changes in the student’s achievement trajectory and profile as a learner; and 
  4. Intelligent measurement—producing intelligent scoring, interpretation of individual profiles, and advice to learners and teachers by means of knowledge bases and inferencing procedures. 

When Bunderson et al. proposed their typology in 1988, statewide testing was well-ensconced in policy and practice, but perhaps only a handful of states were investigating computer-based tests. Today in 2020, state testing clearly is in the 2nd generation of Bunderson et al.’s typology in terms of computerized measurement –almost all states are administering state tests by computer, and many are offering computer adaptive tests.  

It is instructive to examine how and why that change to computer-based assessment occurred, and who helped make it happen.

In my narrative, in the 1990’s through the early 2010’s, state tests were custom, meaning each state test was developed to fit each state’s unique content standards, vision of validity, and administration conditions. The latter two factors influenced things such as how much of the state test was multiple choice or written responses or answers to math problems generated by the students.  

Several commercial assessment companies provided services to develop, administer, and score these state assessments. These companies provided individualized service to each of their state clients, reflecting the state’s values. For example, almost every state had an assessment of writing, but there were differences across the states: 

  • how writing was defined in the state’s content standards, 
  • how writing genres were defined (e.g., narrative, persuasive, informational, analytic), 
  • in what grades was each genre assessed, 
  • the complexity of the prompts, 
  • the length of the expected response, 
  • whether a holistic or analytic rubric was used to score, 
  • how quickly writing tasks were changed across years, 
  • how writing scores were combined with reading scores to create English language arts (ELA) scores, 
  • what would be reported about writing performance, 
  • how ELA scores were equated from year to year, and so on.

Six companies provided full assessment services in this state custom market, from development through administration and scoring: CTB/McGraw-Hill, DRC, ETS, Harcourt, Measured Progress, and Questar.

Although the state tests were custom in the specifics, they all were administered on paper, and constructed responses were scored by human readers. However, that methodology changed in a short period of time. In 2020, the market has shifted somewhat away from custom state assessments to assessments that share key aspects across states and has shifted from paper-based to computer-based assessments.  

These shifts are reflected in the mostly different companies that now dominate the state assessment market: AIR, DRC, Pearson, Questar, Smarter Balanced. These companies own the software and can provide the expertise for computer-based state assessments. Of the companies dominant in the custom market, Harcourt was acquired by Pearson, CTB/McGraw was absorbed by DRC, and ETS and Measured Progress (now Cognia) contract with another provider for their computer-based platform.

What Underlies This Shift Among States and Companies to Computer-Based Testing? 

What happened between 2010 and 2020?

First, there is the technical capacity to deliver computer-based assessments, which  requires different expertise than the skills needed to develop an educational assessment; for example,

  • how to publish text, graphics, and other media across a wide spectrum of devices with changing standards and web delivery software, and 
  • how to manage stable delivery across networks from Amazon servers to local school networks.

But this shift was not just fueled by advances in technology.

An underlying value was the desire for efficiency. Administering tests on computers rather than on paper was more efficient. Tests could be delivered to schools and individual students more securely, less expensively, and results could be delivered back to schools more quickly.

This efficiency was magnified because almost all states agreed on the same content standards. States were willing to give up individual state standards for common content standards. Adoption of the Common Core State Standards by 45 states in the early 2010s meant that assessment companies did not have to bring to bear the skills of making bespoke assessments, and companies could share items or even whole tests across states or consortia of states. 

Computer-based testing accelerated the search by states and assessment developers for “technology-enhanced item” (TEI) formats that could take advantage of the computer’s affordances (e.g., presentation of graphics, easy computer scoring of drag-and-drop and click responses, and potential automatic scoring) and replace expensive and time-consuming human scoring of student-constructed responses.  

What’s Next in State Assessment?

The third generation of computerized measurement, according to Bunderson et al. will be marked by “continuous measurement” in less intrusive ways (“embedded in curriculum”) and focused on an individual student’s learning. While that level of continuous measurement is still largely only a vision in-state assessment, there are many advances along those lines in other areas of educational assessment outside of state assessment for school accountability; primarily assessment of individuals for classroom assessment or certification.  

But in-state large-scale assessment there is a noticeable appeal in combining measurements taken at different times to supplement or even supplant the summative end-of-year test. States working on an early stage of this “continuous measurement” model include New Hampshire, Louisiana, Georgia, North Carolina, Nebraska, and Virginia. Most of these states are working with the companies supporting their large-scale assessments, often with additional partners.  

One general approach uses interim assessments administered on a roughly quarterly basis. Another approach uses curriculum-embedded assessments that are administered more frequently throughout the instructional period. NWEA and ETS have announced their intent to develop commercial, off-the-shelf products that would support merging interim assessments and summative determinations.  

So, new capacities and affordances offered by advances in computer and other technology may foster changes in state assessment. But in addition to more efficiently doing what was done before, there are additional reasons for changes.

Although marked by a common interest in more continuous measurement rather than a single end-of-year assessment, these states differ in their aims and underlying values. Some are interested primarily as a way to counter the perceived reduction in validity of measuring only what can be administered and scored by computers currently on an on-demand assessment. Some are interested in the efficiency of a tighter alignment between local and federal assessments.  

For a historical analysis of the past 100 years, see the article by S. Moncaleano and M. Russell, (2018).  A Historical Analysis of Technological Advances to Educational Testing: A drive for efficiency and the interplay with validityJournal of Applied Testing Technology, Vol. 19(1), 1-19.)

In Part 2, I will explore changes in state assessment and accountability bubbling up from changes in educational theory and public values.

Note: I gratefully acknowledge those who have helped shape my thinking on this topic, chiefly my colleagues at the Center for Assessment, especially Charlie DePascale, with whom I’ve shared many hours of engaging conversation. However, the opinions expressed are mine and are not intended to represent the Center or my associates.