Supporting Assessment Design and Use

Communicating Clearly to Support Better Assessment Design and Use

Assessment design is a complex endeavor requiring clear communication. It is the negotiation of multiple factors (e.g., technical, contextual, policy) that, depending on how they are prioritized, can lead to dramatically different results. Too often, test users are forced to rely on the vast array of psychometric labels and vocabulary we use to describe assessment types (e.g., interim, diagnostic), characteristics (e.g., balanced, coherent), and practices (predict, inform, track) to make meaning about assessment tools and processes. These words and phrases are often thrown around as if they have a common, well-understood definition, but we know that is not the case.

As with most things, an individual’s background and experiences can influence how even the most common of labels is interpreted. For example, an educator who has only ever administered one interim assessment product may associate the term “interim assessment” with features, reports, or uses that are specific to that test, rather than recognizing it as a broad category of assessments. Consequently, the interpretation you intended by using the term interim assessment may not be the meaning that is taken away.

There are many words I could use to elaborate this point, but I want to focus on a common assessment design trifecta – purpose, uses, and claims. In our field, it is frequently stated that assessments must be designed to reflect the intended purpose and uses, and support desired claims about student (or aggregate) performance. In my mind, the meaning of this phrase is clear, as is the definition of each of these words:

Purpose – the primary reason for developing the assessment, which may be framed as the question the assessment is intended to answer or the decision/action the assessment is intended to support (e.g., issue a driver’s license; prioritize professional development initiatives)
Use(s) – how the designer intends to apply the assessment results consistent with the purpose (e.g., evaluate whether a learner meets minimum expectations required to drive; identify math domains and/or content standards that are problematic for all students within a district)
Claim(s) – what we want or need to be able to say with the assessment results to use them as intended (e.g., Erika demonstrates sufficient understanding and performance of the rules and practices necessary to operate a car safely and effectively; across all grade levels students are not demonstrating the ability to analyze text).

Of course, others may define and operationalize these words differently. My Center colleagues have been known to take up much of a two-day staff meeting arguing about definitions. In fact, some of you may be thinking, “her Purposes are Uses and vice versa.”

In practice, statements of purpose and use are commonly conflated. For example, would you consider “predicting future performance” a purpose or use? How about educator evaluation or identifying students for remediation? Often it seems as if the words “purpose and use” are grouped together out of habit, not because doing so adds any value or meaning to what is being said. They almost seem to exist as one big word – purposeanduse – like in that great children’s song LMNO from They Might be Giants.

Although I’m making a big deal about it, in the end, it doesn’t matter if I call something a purpose and you call it a use if there is a shared understanding of the type of information needed to support assessment design and why it is important. That said, if practitioners can’t agree upon the meaning of these words, what value do we expect them to have to others when used in isolation?

Building Understanding to Support Assessment Design

I often use my mom and kids as test subjects to see how well I can describe the work that I do clearly and precisely. They might ask me about my day, what I am working on, or even what’s for dinner, and I use this question as a jumping-off point to see whether I can explain an idea or problem in a way that makes sense to them.

My strategies for engagement vary but often include witty analogies (IMO); references to music, movies, or food; and strategic swearing – which tends to both hold their attention (I have older teen boys) and quickly reflect my level of frustration or concern related to the topic of discussion.

While it’s hard to judge the effectiveness of my efforts from a nod and grunt (a typical 16-year-old response), I often get useful feedback by simply asking “Does that make sense? or “What do you think?”. Our attempts to simplify the communication of complex assessment concepts and principles do a disservice if they result only in nods, grunts, or checkbox-based behaviors (e.g.., Specify a purpose and the uses – ✓) rather than improving understanding and decision making.

Unfortunately, we don’t have the opportunity to engage in curse-laden discussions of the type I have found to be so effective within my family. To echo the advice in a post by my colleague Scott Marion, we need to remove ambiguous labels and vocabulary from our communications about assessment and focus on the details that are really important. We need to start teaching people to think like psychometricians rather than use language that assumes they already are.

Instead of asking for statements of purpose and use, for example, we should pose direct, intuitive questions that get at the heart of assessment design, as illustrated below:

Why do you need an assessment? I want to provide educators with a tool for evaluating students’ ability to analyze text after a unit of instruction.
Who are the results for? Teachers in our district.
What do you want to be able to do with the results? Help educators monitor the effectiveness of teaching strategies focused on text dependent analysis. Identify where students are struggling to engage in analysis.
What question(s) do you want the results to answer? Do students understand how analysis differs from summarization? Can students identify and use the information needed to engage in text analysis?
What does the assessment need to look like or do to answer those questions? Would have to have at least two age-appropriate texts that lend themselves to analysis, a prompt that clarifies the need to analyze and a scoring rubric that clarifies the response expectations, etc.

Taking the time to think through basic questions such as these can help test users understand and articulate the type of assessment they need prior to development or selection. On the other hand, the request to “clarify purpose and use” may elicit the common “measure student’s understanding in order to inform instruction,” which as my colleague Nathan Dadey has stated, holds little value to inform test design or use.

In practice, we do not get to evaluate how people are interpreting the labels and language that we use so frequently. Therefore, we should all avoid hiding behind labels and work to be as clear as possible in saying what we mean and meaning what we say. If we don’t, we are not helping anyone, no matter how much we swear.