Ch-Ch-Ch-Ch-Changes…Turn and Face the Strange World of Assessment

Mar 14, 2019

A Framework to Help State Assessment Teams Deal Effectively With Change

Heraclitus of Ephesus said that “The only thing that is constant is change.”  This observation certainly applies to K-12 assessment programs, as assessment transitions seem to be happening on a more frequent basis and at a more rapid pace in recent years. 

Consider the following information gathered from a recent informal survey of 21 states by the Council of Chief State School Officers (CCSSO) about assessment transitions in the past few years:

  • 16 states have changed assessment programs
    • 10 changed from Common Core assessments to state-developed assessments or SAT/ACT;
    • 6 are making changes to their existing assessment programs;
  • 12 states have changed testing vendors;
  • 8 states have transitioned from paper-and-pencil to online assessments;
  • 5 states have shortened their tests;
  • Other states have implemented new science and/or social studies assessments, removed performance tasks, shifted from untimed to timed tests, changed to 100% machine/artificial intelligence (AI) scoring, added writing tasks, or moved away from an end-of-course model.

Many of these changes have been motivated by demand from the field for less testing time and faster score reporting. At the same time, states are still feeling the pressure to produce assessment results that can serve multiple purposes such as to inform instruction, measure student progress, determine readiness for college and careers, evaluate teacher effectiveness, and of course, support federal school accountability requirements. 

These more and more frequent transitions to assessments have highlighted a need to create a framework that would enable state assessment teams to more effectively evaluate assessments. These evaluations would include understanding upfront how changes to the assessment system could impact trendlines, performance standards, or claims assessment teams would like to maintain.

And because of the evolving needs and requirements in the field–such as new science tests, the continued shift to computer-based testing, and personalized learning–there is no reason to expect the pace of changes to slow down in the near future. 

Balancing Maintenance of Some Aspects of Assessments Amid Changes

Even with all the changes, however, there is usually a desire or mandate for the assessments to maintain performance trendlines. At a minimum, this desire usually means the percentage of students attaining proficiency in English language arts or mathematics are comparable between the original and changed tests (i.e., that the inferences drawn from the benchmark or cut scores are the same.) 

In more extreme cases, it could mean that scale scores on vertical scales are equivalent or can be compared directly across the assessments (e.g., that metrics such as the average scale scores or scale score gains can be compared or calculated).

To support the validity of these types of comparability claims, a validation process that evaluates and compares key aspects of the original and changed tests is needed. Traditionally, a standards validation or review process in which committees of educators and stakeholders convene to evaluate the old performance standards (or cut scores) in the context of the new assessments, can serve this purpose. However, a standards validation process takes a substantial amount of time, effort, and resources to implement effectively, and often cannot occur until after the assessment is administered and stakeholders are waiting for results. Given the condensed timeline and budget allotted for recent assessment transitions, a standards validation process may not be feasible in many situations.

Making Assessment Transitions Easier for States  

The Center is working with New Meridian Corporation, a state assessment developer, to design a framework to make the assessment transition and standards validation process feasible for states. Known as the Quality Testing Standards and Criteria for Comparability Claims (QTS), the driving force behind the QTS is to support states who wish to continue reporting results with the same performance levels and/or use the same score scale after they transition from the previous assessments using test forms developed under New Meridian (and administered by a specific vendor) to custom-developed state assessments that include content from the New Meridian bank (and administered by a state-selected vendor). The underlying principles of the framework, however, are applicable beyond the New Meridian context.

The QTS is the basis of the expert comparability review process, in which a participating state submits evidence from the design, administration, scoring and reporting areas of its new program. Broadly speaking, the areas can be distinguished as follows:

  • Design: “What is on the test?”, including item and test development, fairness and accessibility;
  • Administration: “How is the test is given?”, including test administration procedures and supporting materials;
  • Scoring: “How is test performance determined?”, including item scoring, psychometric procedures and standard setting; and,
  • Reporting: “How are test results communicated and interpreted?”, including score reports and interpretive guides.

The evidence is evaluated by independent expert reviewers to determine the type of comparability claims that can be supported, along with constructive and actionable feedback for continuous improvement. The list of supporting evidence and the evaluation criteria in the QTS are based primarily on the 2014 edition of the Standards for Educational and Psychological Testing, widely recognized as the industry standard for assessment best practices. More details about this new framework can be found in my full paper, Assessment Validation in the Midst of Change.

The Importance of Collaboration During Assessment Changes

One final point to note is that the QTS is not intended to be a compliance framework, but rather to encourage collaboration between the state’s assessment team, its test vendors and partners, and expert reviewers in which constructive and actionable feedback are exchanged to build a validity case for the state’s new assessments. The types of evidence suggested in the framework encourages states to determine upfront how the changes they are considering could impact comparability claims that they would like to make. The comparability review process then helps states adjust or fine-tune their transition plans or prepare stakeholders for what can actually be maintained with the desired changes. 

The comparability review process described in this post addresses radical change, such as when a state replaces a current test with a new test. Often, states are faced with making minor or significant changes to their current test. If you have not done so already, I highly recommend you read Re-envisioning Performance Standards Validation by my colleague Erika Landl in which she proposes a principled approach to evaluating scales and cut scores when changes are made to assessments.