What’s in a Name? The Challenge of Testing Terminology
How We Talk About Assessments Matters
We have a naming problem in our field. Formative, interim, summative, diagnostic, benchmark—and my latest favorite, universal screeners—are just a few examples of testing terminology that is used in ways that confuse many people. This confusion is consequential: It can lead people to think that an assessment will serve a purpose it was never designed to serve.
I might sound like a cranky old man, but this confusion also has a hidden cost for students and is an obstacle to assessment literacy for educators. I’ll walk you through a few examples and end by hopefully offering a way out of this mess.
Interim or Formative Assessments?
My colleagues, Marianne Perie and Brian Gong, and I coined the term “interim assessment” back in 2009. We were concerned that the proliferation of these commercial products was co-opting the well-established research base underlying formative assessment; these products were—and are—anything but formative.
In fact, when we searched the internet for “formative,” “benchmark,” and “diagnostic” assessments, it pointed us to all the same products. We categorized the intended uses of interim assessments as evaluative, predictive, and instructional, to distinguish them from “formative assessment processes,” which are solely instructional teaching strategies.
Our message didn’t take hold as well as we hoped. In the current environment, everything from short testlets administered every few weeks to mini-summative tests administered a few times a year is marketed as interim assessments. We might have cleaned up some of the language, but I don’t think we have accomplished our original goals, because I still hear of plenty of products called “formative.” Perhaps the originators of formative assessments should have named them “formative instruction”; that’s truer to their use-case.
I ranted about the misuse of this term back in the early days of the pandemic, because it seemed that overnight, every interim assessment—there’s that term again—was suddenly diagnostic. Why did this matter? Because people thought they were going to get targeted information to address students’ specific instructional needs. In general, they didn’t, and won’t, at least not from the assessments that were—and still are—peddled as diagnostic. Again, names matter.
Universal screeners recently appeared on my naming radar. This term just invites misinterpretation.
What is “universal?” Does it mean a state or district is using it for all students, to detect all possible educational challenges and opportunities (some people are using these tests as doorways to gifted-and-talented programs)? What about “screener?” Does this term mean the test functions like a complete blood count, designed to check for a wide range of potential flags that could trigger a more targeted follow-up test?
Unfortunately, there are no answers to these questions. The name “universal screener” appears to mean something different to each person who uses it. This amorphous terminology creates confusion about the way these screeners are supposed to be used, how they’re actually used. Originally conceived as a way to identify students with dyslexia—and that’s a good thing—universal screeners’ list of stated purposes and uses is as long as my Trader Joe’s shopping list.
At a recent Council of Chief State School Officers state collaborative meeting of state assessment leaders and test company representatives, Daniel Mix and Molly Buck from Curriculum Associates showed the diverse and often mutually exclusive purposes for which some states use these tests. I would not have interpreted “universal” to mean its use varies from state to state. If these tests are truly universal, their use should at least be similar, if not identical, across states. To hammer home this confusion, one state just passed a universal-screener law that requires the use of a “norm-referenced, formative assessment.” I swear, they are just trying to provoke me!
The Way Forward: Assessment Descriptions
If I had my way, we’d get rid of most of these assessment terms. Users, test companies, and others would have to describe their specific use-cases and outline the evidence that their tools can indeed be used in those ways without serious unintended negative consequences.
Even early in the development phase, test developers should supply some evidence of appropriate design, such as field tests and/or cognitive labs. They could say, for example, that teachers can use a rich performance task to provide information about how well students are able to use scientific practices to make sense of an ecological phenomenon in a recent unit of instruction. They don’t have to call it an interim or unit-based test, but they should have to provide evidence indicating why this task is likely to yield valid inferences about what students know and are able to do as they study ecology.
I’m not naïve enough to think that we’ll get rid of assessment names. Changing them over and over until we find the perfect, incorruptible label, probably won’t work either, as tempting as it might be. In fact, a group of assessment experts recently told me they’re interested in changing the term “balanced assessment systems” because they’re worried it has been co-opted by the purveyors of interim assessments.
When I argue here for descriptions instead of names, I realize I’m asking for longer statements in this age of ubiquitous abbreviations. But we owe it to those we serve, especially students, to be as clear as possible about what different assessments do, and that means being clear with our language.
When educators use assessments with labels such as “formative” or “diagnostic,” they expect to get information that can support instructional decisions. When they do not, they might misunderstand what is meant by formative or diagnostic. Students lose out as well, when teachers think they are gleaning prescriptive information from certain assessments, but students are misplaced into instructional interventions not aligned with their learning needs.
So when you tell me you’re offering me an assessment that is “instructionally useful,” please define what you mean and describe the evidence that your test can provide this critical information. When you tell me your test is going to “screen” students, please tell me who it’s going to screen, what it will screen for, and what evidence you have that it can do so effectively, with limited unintended negative consequences for students and educators. Building public understanding of assessments, and using them responsibly, depend on it.