Educators Need Clear Descriptions of the Intended Uses and Interpretations of Educational Assessments
Educational assessment has a naming problem and the latest example of it is the current rhetoric regarding wide-scale use of diagnostic assessment. This is likely true for other fields as well, but it is particularly rampant in education. When my colleagues, Marianne Perie and Brian Gong, and I first wrote about “interim assessment,” we conducted a search for the terms ‘formative, benchmark, predictive’ and several others. Exactly the same companies and the same products appeared in the result of every search; no wonder there is confusion and misuse of important terms. We tried bringing some clarity to the various labels used to describe the range of non-summative assessments. We were not successful because, as I discuss below, labels are too vague to convey the meaning educators desperately need.
In light of the current COVID-caused school disruptions, many policymakers and testing proponents are pushing for the large-scale administration of “diagnostic” assessments to gauge the “lost” learning and to evaluate the increase in achievement gaps. For example, Education Reform Now (ERN) recently released a brief calling for the wide-scale use of diagnostic assessments next fall. They included a table of 15 different assessments from six providers, most of which do not even call their products diagnostic, but ERN basically urged their readers to just pick one or more and get going.
Unfortunately, pushing for large-scale “diagnostic assessments” could lead to confusion and misuse. I am not just an assessment geek fussing over terminology; rather, I am concerned that labels such as “diagnostic” are not specific enough to clearly describe intended uses. Instead of relying on vague labels, we all would be better served if specific use cases were described and assessment suppliers provided evidence their products could meaningfully support such uses.
Diagnostic Assessment in Education
Diagnostic assessment in education historically has been associated with special education. Once a student has been “screened” and determined potentially eligible to receive special education services, they often undergo a series of finer-grained tests to diagnose specific learning disabilities and hopefully design appropriate interventions for individual students.
Additionally, many early reading and literacy assessments have been designed to help identify and address challenges before students develop serious reading difficulties. The result of these diagnostic tests will help teachers implement specific interventions designed to help the student benefit from instruction.
More recently, cognitive diagnostic models (CDM) capitalize on complex psychometrics along with precise and narrow domain definitions to help “diagnose” what knowledge and skills students have “mastered” or still need support.
The important point, which is especially relevant for this fall, is that it does not make sense to invest the time necessary to diagnose a learning issue unless it is followed with a specific plan to improve that student’s learning.
I urge assessment providers, policymakers, and others to define the intended use cases as specifically as possible, as illustrated below:
- State and district leaders will use what they ascertain about learning shortfalls and learning gaps to address differential resource needs.
- School leaders can use information about learning shortfalls and performance gaps to support organizational and professional learning needs in the school.
- Teachers will use the information about students’ key precursor concepts and skills to help prepare appropriate materials and strategies to help students succeed in the first instructional units of the school year.
Each of these examples of use cases speaks to a different assessment design and interpretative framework. Many of the assessments being proposed are more appropriate for decisions at the aggregate level by state, district, and school leaders rather than the individual student level by teachers. For instance, my colleague Will Lorié has written about how profiles might be derived from large-scale assessment data and to help address class or school-level strengths and weaknesses.
Whatever the use case, the utility of any assessment employed is contingent upon the effectiveness of the decisions and actions that follow from the interpretation of the results. For example, a school-level intervention might be something structural like procuring a new math curriculum more conducive to online instruction and providing teachers with intensive professional development about how to scaffold students up to grade-level content while avoiding an overreliance on remediation.
Like in medicine, diagnostic tests in education must be accurate and precise to avoid incorrect prognoses and treatments. Most of the rhetoric associated with potential “diagnostic” assessments in education is focused on informing instructional interventions. Unfortunately, the large-scale assessments promoted as “diagnostic” are not capable of supporting the types of instructional interventions necessary to help each student make up the learning losses from this past year, which could be a real disservice to many teachers and students.
Educators and school leaders are trying to figure out how to deal with extraordinary school interruptions this year and an uncertain educational future. They are desperate for tools and resources to help meet these needs. The allure of an assessment that can be administered quickly and produce results to inform teachers, school leaders, and policymakers could be too tempting to resist. Except, it is fiction. It is unfair to dangle false hopes in front of educators who are trying to do what is best for kids.
I am not ruling out the parsimonious use of some district or state assessments to inform resource allocation and related decisions, but we need to be clear that calling an assessment “diagnostic” or “instructionally-useful” does not make it so. Assessment leaders must be specific about the uses assessments can and cannot support. In fact, this would be a welcome change that should last well beyond the current pandemic.
Supporting Instruction and Learning
I previously described five features that might enable assessment results to inform and improve instruction. Coherence with the intended curriculum and providing results at the right grain size are most relevant as we think about assessment next fall. This means that assessments closest to the classroom, tied to the enacted curriculum, must be the highest priority for next fall. My colleague, Carla Evans, offers more detailed suggestions in her recent post. These assessments ideally will already be part of the high-quality curriculum, but if not, teachers and local educational leaders will need time and support to both create these contextualized pre-assessments and improve their formative assessment strategies. Our students will benefit if we keep these uses at the forefront rather than trying to make sense of vague labels.
I am indebted to my colleagues, Brian Gong, Will Lorié, and Charlie DePascale, for pushing my thinking and hopefully improving this document. Of course, any shortcomings are my responsibility.
Note: The discussion of use cases in this post is drawn from a forthcoming paper as part of a larger set of resources sponsored by the Council of Chief State School Officers (CCSSO).