A bar chart is shown left to right across the image, fading as it moves toward the right to demonstrate the risk of losing relevancy. The background is a colorful lava lamp design indicative of the many changing factors that influence measurement.

Tension Between Educational Measurement and K-12 Assessment

Reflections on NCME 2023

I recently attended the annual meeting of the National Council on Measurement in Education (NCME) and saw many signs of a positive development: an increasing recognition that educational measurement is, in the end, about serving stakeholders, particularly students in K-12 public schools. 

There were quite a few sessions dedicated to issues such as increasing the instructional utility of assessments, considering equity issues in systems design, improving the collaboration between vendors and state agencies, and innovating with through-course assessment systems, to name a few. 

But I was also reminded that there are still striking disconnects in our work that have implications for the impact our field can have on K-12 learning, assessment, and accountability. I want to briefly discuss four that stood out for me this year and what we might learn from them as a community of measurement professionals. 

  1. Accountability Constraints vs. Innovation Desires

Not surprisingly, the forced marriage between federal accountability systems and innovation efforts is a bit awkward, sometimes even dysfunctional. While initiatives like the Innovative Assessment Demonstration Authority (IADA) grant program are laudable in principle, many of us at the conference noted that the expectation to innovate while also showing close technical comparability to existing, less innovative assessment systems, constrains innovation—a paradox indeed. 

Innovation is also expensive. It requires investments in data infrastructures, delivery technology, new task design approaches, pilot studies, and so on. The costs of innovation also depend, to a lesser degree, on the number of students it might eventually serve at scale, but education funding certainly depends heavily on that number. As a result, supplementary grant programs such as IADA, that are meant to spur innovation but come with zero federal funding, are problematic.

  1. Methodological Precision vs. Interpretational Hand-Waving 

Studies in educational measurement that try to bridge theory and practice through empirical evidence can sometimes be an awkward blend of high-end scientific precision and moments of low-end hand-waving on the practical implications of the resulting outcomes. 

Over the years, I have attended many sessions where presenters shared rigorous investigations of complex statistical methods through simulation studies, motivated by real problems in education. But when they turned to interpreting effect sizes in practical terms, their descriptions became decidedly more broad, directional, and uncertain. 

This doesn’t mean that such investigations shouldn’t be done, of course. But I do think that more applied users might intuitively expect overall precision coupled with meaningfulness as the ultimate outcome of complex studies when, in fact, precision or meaning are heavily contextualized. 

  1. Assessment Innovations vs. Learning Innovations

I don’t expect all conversations about assessment innovations to be inspirational fireworks. But there is a big difference between talking about innovation principles or hypothetical ideas and demonstrating their power through scaled-up use cases with rich stakeholder data. At NCME, the former is definitely more common than the latter.

Put simply: If someone wants to learn about innovations in assessment, I don’t think they should go to conferences with an educational measurement or other psychometric focus. Our discipline continues to be chained by its historical traditions that evoke prototypical framings of problems. This effectively hinders the discovery and exploration of novel solutions.

Should we be surprised if conversations about innovation that focus on issues such as technical comparability, form equating, differential functioning, standards compliance, validity theories, and so on rarely inspire us to think differently about the ways we do business? I don’t think so. 

  1. Methodological Advancement vs. Operational Advancement

Methodological advancement is certainly an important part of our work. As someone who grew up with and appreciates the analytic opportunities that diagnostic measurement models provide, I am truly amazed by the ongoing proliferation of psychometric research in this space. A lot of that research was showcased at NCME. 

But I ask this: Do we really need to conduct and/or showcase en masse the next evolution of simulation studies for hyper-complex diagnostic measurement models for educational contexts if we can barely get to the point where even “simpler” models are used meaningfully at scale (with a few laudable exceptions)? 

Disconnections in Assessment Work: A Path Forward

So what can we do as measurement professionals to ease some of these tension points? Here are a few simple ideas: 

  • Stop perpetuating unproductive methodological framings of the past for the sake of a disciplinary throughline while squeezing in only minor innovative practical ideas. Change the narrative instead. 
  • Shape new ways evidence can be collected, aggregated, and synthesized to provide differentiated, principled, and realistic stories about how we make sense of how to innovate with impact.
  • Push for scientific precision in contexts where it is truly required, but acknowledge the inherent imprecision that might override this precision for actual decision-making.
  • Evaluate consequences and risks thoughtfully through theories of change, and help others build mental models that foster intentional action informed by a deeper, more differentiated understanding of the ecosystems at play.
  • Explore the utility of existing modern models from the joint toolkit of data science, educational measurement, and machine learning rather than inventing new unnecessary twists on existing complex traditional models that cannot be reliably and efficiently estimated for operational practice.
  • Continue to reimagine and adapt professional standards that shape our best practices, but do it  more frequently (hey APA, AERA, NCME: we are still waiting for the update to the 2014 edition of our Standards!).
  • Engage in more research-practice partnerships, working across state, district, and vendor boundaries to develop realistic expectations for impact and implementation. 
  • Attend conferences dedicated to innovation in education such as SXSW EDU to understand more broadly how influential voices are shaping the perceptions and practices of learning on the ground.

If we don’t do these things soon, our profession risks losing our relevance for educational innovation and will come across as simultaneously sophisticated yet naïve. 

Remaining insular and too traditionalist undervalues the insight of hardworking, competent professionals in states, districts, schools, and support organizations who have a much more nuanced and complex understanding of the realities of what makes school systems work on the ground. 

Moreover, this will do a disservice to our students who want to be empowered, engaged, and adequately prepared for a future of their choosing while technology, careers, and society are evolving rapidly. We need to get better at measuring what matters in the service of learning, not just measuring what we’ve always measured.