Instructing & Assessing 21st Century Skills: Key Measurement & Assessment Considerations

Jul 16, 2020

Research and Best Practices: One in a Series on 21st Century Skills

For the full collection of related blog posts and literature reviews, see the Center for Assessment’s toolkit, Assessing 21st Century Skills.

We use readily available, well-defined tools to measure physical objects and constructs—tape measures, rulers, scales, and mile markers. However, as we have discussed in other posts in this series, measuring student proficiency in 21st century skills such as critical thinkingcollaborationcomplex communication, and/or self-directed learning is not as straightforward as measuring height, length, weight, or distance. This post highlights the challenges and considerations the educational measurement community must solve, critical obstacles to instructing and assessing 21st century skills, such as:

  • Poorly defined constructs.
  • Limited understanding of how students develop these skills over time (e.g., learning progressions).
  • The interactions and interrelationships among the various 21st century skills.
  • The appropriateness of separating the skills from content and context.

In addition to the measurement challenges faced by all large-scale and classroom assessment, the assessment of 21st century skills faces two major hurdles: (1) sufficiency and generalizability considerations; and (2) cultural validity and equity considerations. Each is discussed in turn below.

Sufficiency & Generalizability Considerations

Educational measurement refers to the use of assessments and related scores to infer what students know and can do at particular levels of cognitive demand with respect to identified learning targets. Inferences from student assessment results are used to support claims for particular uses—student accountability, school/district accountability, state report cards, etc.

As Scott Marion and Carla Evans explained in more detail in a previous postsufficiency is a judgment about having enough credible evidence to support the claims, uses, and decisions that result from assessments. Sufficiency refers to both the quantity of assessment evidence and the quality of assessment evidence.

Trying to make inferences about what a student knows and can do in relatively narrow and well-defined content areas (e.g., Grade 3 mathematics, Grade 6 reading) is quite challenging. It minimally requires:

  • Clear construct definitions—What is it you want to measure?
  • Research-based end of grade level, end of grade span, or at least end of high school proficiency targets—What are students expected to know and do, and at what level of cognitive rigor at specific markers in time?
  • Identified claims and uses—What is it that you want to say about students, and for what purpose?
  • Adequate domain mapping—What collection of evidence is adequate to support identified claims and uses?
  • Reliable scoring—What degree of score reliability is adequate to support identified claims and uses?

In the context of 21st century skills, however, there are no clear construct definitions, empirically-validated learning progressions are limited to non-existent, and there are few research-based proficiency targets.

Sufficiency is closely related to generalizability, representing a broad claim about what a student knows and can do across contexts and conditions. Generalizability is essentially the measurement analogue to the concept of learning transfer and requires sufficient evidence as support. (Marion & Evans, 2018)

How much evidence is enough for a teacher to make a broad claim or generalization about a student, such as: The student is an effective collaborator? Furthermore, does the evidence support the inference that a student is an effective collaborator across multiple settings (in school/out of school), conditions (virtual/in-person), content areas, and cultures?

For example, is it possible to collect enough information to make a broad claim that is generalizable across settings, conditions, and content areas for a 21st century skill such as critical thinking?

It is highly unlikely that any single project or assessment could elicit evidence on all dimensions of critical thinking (interpretation, analysis, inference, evaluation, and explanation), for example, let alone a student’s ability to think critically in multiple content areas such as science, art, history, mathematics, and so on. Accumulating a body of evidence over time across content areas such as in a student portfolio or senior exhibition could allow for broader claims to be made. However, there will surely need to be a prioritization of claims and reduction in the typically comprehensive and all-inclusive school/district priorities and graduate profiles.

Even if it were possible—would such a broad claim be useful? Is it beneficial to claim that a student is an effective collaborator, self-directed learner, complex communicator, and/or critical thinker in all contexts, content areas, and conditions? Or would more specific and contextualized claims be more useful for students and parents?

  • For example, after receiving instruction on elements of effective collaboration in a project on transfer of energy, Susie demonstrated her knowledge and skill in helping the group to plan and make decisions; later reflecting on how she could better adjust her efforts next time to help the group accomplish its goal.

This rich, contextualized and qualitative claim is more akin to what one might find in a narrative report card statement. This type of claim seems useful to students and parents for understanding student learning in the present and future goal setting.

Specific and contextualized claims may also protect against potential conflicts when assessing 21st century skills. For example, there are situations where scoring high on critical thinking does not earn a student high marks on collaboration (and vice versa). Depending on the context and goal of the task, 21st century skills are sometimes at odds with one another.

Cultural Validity & Equity Considerations

Cultural validity refers to how students’ cultural background might differentially influence their responses and interpretations of items or tasks (Ercikan & Oliveri, 2016). Traditional academic skills such as reading, mathematics, and science are less culturally dependent than measures of 21st century skills, as evidenced by the use of international tests such as TIMSS and PISA (Care, Kim, Vista, & Anderson, 2018). What is culturally appropriate in terms of interpersonal competencies (e.g., collaboration, communication) and intrapersonal competencies (e.g., self-direction) likely varies across cultures in ways that are not well understood. It is important to be cautious about making sweeping claims about the accuracy or adequacy of a 21st century skill assessment without local confirmation of the validity of inferences from assessment results.

Additionally, it is unclear the extent to which we should expect similar progressions of 21st century skills for all student groups such as students with disabilities or English learner populations. There is simply little to no research yet available in this area to make broad claims with certainty. This is problematic from an equity perspective because assessments should be fair and unbiased for all students.


Measuring students’ demonstration of 21st century skills is complex—unlike measuring height, weight, or width. The purpose of this post was to lay out the measurement and assessment considerations related to 21st century skills, particularly around the claims and uses that can be supported with evidence. We call on the measurement community to partner with educators and researchers to overcome the obstacles described in this post. We conclude that specific and contextualized claims are more valid, reliable, fair and useful than broad and generalizable claims. Designing assessments to support those claims is a good place to start.

The final post in this series provides guidance to educators on designing 21st century skills classroom assessments, and how to use the information supplied by such assessments for instructional and evaluative purposes.