Assessing durable skills | Center for Assessment

Rethinking how states can credibly assess 21st century competencies

In a previous blog, I argued that states should not start with assessment when they try to gauge mastery of 21st century competencies, sometimes referred to as durable skills. That still holds. But it leaves an important follow-up question—one I hear frequently from state leaders:

If we don’t use a traditional test, then what does it actually look like to assess these durable skills in ways that are credible, understandable, and aligned with state policy goals?

This is where the conversation gets more nuanced—and more promising.

The good news is that states are not starting from scratch. There is already a growing landscape of tools, approaches, and policy levers that can support meaningful assessment of durable skills like critical thinking, communication, and collaboration. The challenge is less about whether we can assess these competencies and more about how to do so in ways that align with purpose, context and constraints.

What Is Durable-Skill-Based Assessment?

At its core, durable-skill-based assessment is about what students can do with what they know. This has important implications for states. If we want to measure readiness for life beyond school, we need forms of evidence that reflect the complexity of real-world performance, not just isolated knowledge.

Traditional assessments tend to emphasize knowledge of academic content, often through multiple-choice and short-answer questions administered at a point in time. Skill-based assessment shifts the focus toward whether students can apply their learning to solve problems and navigate real-world challenges.

Related: Review this collection of the Center’s work on assessing 21^st century competencies (also known as durable skills).

A helpful way to think about this is through the analogy of learning to drive. Passing a written exam demonstrates knowledge. Passing an on-the-road test demonstrates the ability to apply that knowledge. But even that doesn’t capture the full picture. Real-world competence also depends on attitudes such as persistence, responsibility and motivation.

As a result, durable-skill-based assessment is not just about knowledge or even performance in isolation. It reflects the integration of knowledge, skills and attitudes within a given context.

In 4^th grade math, this could look like students working with a partner (collaboration) to apply their knowledge, skills, and understandings of multi-digit multiplication by determining how many items to order in bulk to stock a school store, calculating total quantities and costs, then writing a justification for their purchasing decisions (communication).

Expanding the Assessment Tool Landscape

Once we move beyond traditional tests, a wider range of assessment tools becomes available. Many of these tools are already familiar to educators but less visible in state policy conversations.

At a high level, skill-based assessment draws on evidence from what students produce, say, and do over time. This can include:

Student work such as performance tasks, capstone projects or portfolios
Observations and reflections from teachers, peers and students themselves
More structured technological tools such as simulations, interactions with AI agents or situational judgment tasks

What matters most is not any single tool, but how these tools are used together. In practice, the most credible approaches rely on multiple sources of evidence collected across contexts and over time.

This represents a shift in mindset. Rather than asking, What score did the student get? we begin to ask, What body of evidence do we have about this student’s durable-skill development?

From Evidence to Meaningful Judgments

Collecting evidence is only the first step. The more difficult—and more important—work is making sense of that evidence.

For complex competencies like collaboration, critical thinking or communication, no single task can provide a complete picture. Instead, states and systems should think about how to triangulate evidence across multiple contexts, bringing together different perspectives and artifacts to support a more credible judgment.

Equally important is the role of developmental progressions. Evidence only becomes meaningful when it is interpreted against a shared understanding of what growth looks like over time. What does emerging collaboration look like in elementary school? How does that differ from more sophisticated collaboration in high school?

While many of these progressions are still evolving, they provide a critical foundation for supporting consistent interpretation of student work, guiding instruction and feedback, and communicating expectations clearly to students and families.

Without this kind of shared framework, even rich evidence can be difficult to interpret in consistent and meaningful ways.

Start With the “Why”: Clarifying Use Cases

One of the most common pitfalls I see is jumping too quickly to tools before clarifying purpose. States that are pursuing durable skill-based assessment are doing so for different reasons, and those reasons matter. Three use cases could be:

Improving teaching and learning at scale
Providing meaningful information to students and families
Creating portable credentials or diplomas that have value beyond K–12

Each of these goals leads to different design decisions.

For example, a system designed to support classroom instruction can prioritize rich, contextualized evidence. A system designed to produce statewide credentials may require more standardization and comparability. When these purposes are not clearly defined, systems can become overextended, trying to do too many things at once, and doing none of them particularly well.

The key is alignment. The purpose should drive the design—not the other way around.

State Policy Levers: A Broader Set of Options

In the earlier blog, I described a continuum of state involvement with just a few examples. Building on that idea, it is helpful to make the options more concrete.

States could use a range of policy levers to support the teaching and learning of durable skills, many of which do not involve traditional testing.

At one end, states could focus on building capacity and coherence, for example by:

Developing a Portrait of a Graduate to define shared expectations
Providing guidance, resources, and professional learning
Revising standards to more explicitly embed skills within content areas

As states move toward more formal signaling, they may:

Create competency-based diplomas, badges, or skills profiles
Require capstone projects as part of graduation
Include local indicators that describe how districts are teaching and assessing these skills

Further along the continuum, states may integrate skill-based evidence into more formal systems, such as:

Embedding performance tasks into statewide assessments
Adding indicators tied to skill demonstration into accountability systems

These choices are not simply technical; they are policy decisions about what the state values and how strongly it wants to signal those values.

Assessment Tradeoffs States Must Navigate

As states consider these options, it is important to recognize that there are no perfect solutions, only tradeoffs.

Several key tensions consistently emerge:

Comparability vs. flexibility
Validity vs. scalability
Innovation vs. burden
Local control vs. statewide coherence

These are not problems to eliminate, but realities to navigate. For example, increasing comparability often requires standardization, which can reduce the richness of tasks. Supporting local innovation can produce more meaningful learning experiences, but may introduce variability in how results are interpreted. Scaling any approach statewide inevitably raises questions about cost, capacity, and feasibility.

The work for states is not to avoid these tensions, but to make them explicit and make intentional choices about where to sit within them.

Moving Forward with Coherence

If there is one idea that cuts across all this work, it is the importance of coherence. Credible approaches to assessing durable skills are not built around a single instrument or initiative. They emerge from systems where: a) definitions, instruction, and assessment are aligned; b) multiple sources of evidence are valued and used together; c) expectations are transparent and shared; and d) uses of results are clearly defined and appropriate.

States have more options than ever before to support this work. The challenge—and the opportunity—is to move beyond the idea that assessment must look like a traditional test.

The question is no longer whether we can assess these skills. The question is whether we can do so in ways that honor the complexity of learning while still meeting the needs of policy: credibility, clarity, and coherence.

Photo credit: Allison Shelley for EDUimages