Three Common Misconceptions About State Assessments

Proficiency, Comparability and Instructional Utility

Sit down, everyone! It’s time for a quick pop quiz on statewide testing. Here are the three questions:

  1. Do state assessment results inform instruction?
  2. Should test scores always be comparable?
  3. Should proficiency mean the same thing on state tests and NAEP (The Nation’s Report Card)?

All three questions highlight widespread misunderstanding about what state tests are designed to do and their appropriate use. The testing field itself is partially to blame for those misunderstandings; it hasn’t done a great job of communicating about assessment.

In a bid to do better, the National Council on Measurement in Education—the professional association for those in the educational assessment field—brought together a few experts to shed light on aspects of statewide testing that are widely misunderstood. You can watch the webinar on common misconceptions about state assessment here, but we’ll share some highlights in this blog post.

Do State Assessment Results Inform Instruction?

Carla Evans, a senior associate at the Center, gave the presentation on this question. (She and the Center’s Executive Director, Scott Marion, are writing a book about this. Stay tuned later this winter for its publication. In the meantime, read Carla’s blog post about whether state tests can inform instruction.)

Evans pulled no punches about how she feels when people claim that statewide test results can inform instruction. “It’s like nails on a chalkboard,” she said. She reminded us that under federal law, the primary purpose of statewide testing is to evaluate school quality, and this purpose drives the design of state assessments. State tests are not designed to help teachers adjust instruction in the moment, with the students who just took that standardized test, Evans said.

Whether a test’s results can inform instruction depends on its content, form, timing, grain size, relationship to enacted curriculum, and other factors, she said. Test results can inform instruction indirectly, through the actions that flow from program evaluation, Evans said.

To combat the misconception that state tests inform instruction, Evans urged state and local leaders to communicate clearly and often about the purpose and use of state tests and emphasize that they are not designed to directly inform instruction.

Should Test Scores Always Be Comparable?

Stephen Sireci, a professor of psychometrics at the University of Massachusetts-Amherst, led this part of the discussion, noting that even testing professionals have long wrestled with questions about comparability.

Comparability permeates the field’s discussions about test fairness, Sireci said. It’s not just a matter of whether one state’s test results can (or should) be comparable to another. Even on the same test, there are questions about whether results mean the same thing for students who take different test forms, for instance, or for students of different racial, ethnic, academic or socioeconomic backgrounds.

Sireci said many believe that results are comparable on widely recognized large-scale tests. He used TIMSS, the Trends in International Mathematics and Science Study, to shoot down that myth, noting that students of many countries take that exam in different languages, yet results are reported “as if they’re on one scale.”

Sireci invited the field to carefully interrogate the need for comparability. He said testing professionals should ask these questions: What is the purpose of the scores, and do you need to make a comparability claim based on those results? If you do need comparability, what level of evidence do you need to support it?

Then Sireci went one step further, with a suggestion certain to spark questions and pushback:

“Don’t start out thinking about comparability,” he said. “Think about what’s best for each student. What type of assessment is best going to support students in their journey toward acquiring knowledge and skills? Focus on what’s best for each student and worry about comparability and accountability later.”

We don’t need comparability in all assessments, Sireci said. Comparability should depend on the purpose and use of the test. If competition for a job, or entry into a degree program with limited seats, depends on a test, then comparability is important, he said. But “95 percent of the assessments we do in education are not about competition, nor should they be. … Sometimes I think we care about comparability a little too much.”

Should Proficiency Mean the Same Thing on NAEP and State Tests?

Lesley Muldoon, the executive director of the National Assessment Governing Board, which sets policy for NAEP, led this section of the discussion. When asking this question, she said, it’s important to understand how achievement levels on NAEP differ from the achievement levels states set for their own tests.

“When NAEP says ‘proficient,’ it might mean something different than what State X says when it’s reporting to parents on student proficiency,” Muldoon said. The most recent in a series of state mapping studies demonstrates this. Results vary by subject and grade level, but in many cases, states’ definitions of “proficient” map more closely to NAEP’s “basic” level.

Those disparities are best used to inform large-scale improvement initiatives, Muldoon said. She noted the role NAEP data played in the reforms Massachusetts began in the 1990s, the literacy initiative Mississippi undertook in 2013, and the reading initiative Tennessee launched in 2008.

In the array of tests given at different levels, NAEP occupies a unique niche, Muldoon said. It wasn’t designed to shed light on classroom instruction. Its strengths are enabling cross-state comparisons and reporting on big-picture achievement trends over time. She warned against trying to re-purpose NAEP as a tool to understand school- or student-level results. “Trying to move it in that direction would jeopardize its status as a trusted independent benchmark,” she said.

Moderating the discussion, Marion reminded participants that aiming for perfect comparability can hobble test innovation. The field of testing is full of “lousy labels” that mislead the public, he said.

The legislative discussions that produced No Child Left Behind did not presume NAEP-like levels of achievement when they wrote “proficient” into the law, Marion said. But because both ended up using the label “proficient,” he said, the public must now contend with confusion and a perceived “honesty gap.”