How Much is Enough?
Sufficiency Considerations for Competency-Based Assessment Systems
Many schools have turned to competency-based education for meeting both equity and excellence goals. Competency-based education requires students to demonstrate mastery of key knowledge and skills rather than merely meeting some passing score “on average.”
Local assessment data are often used to evaluate student mastery of identified competencies. There are many measurement challenges that arise when using assessments to support decisions about students’ competence. This blog focuses on one—sufficiency.
Sufficiency is closely related to generalizability–the measurement analogue to transfer in learning. Generalizability quantifies the degree to which performance on an assessment represents the student’s knowledge and skills if we could have administered all possible assessments of the same learning targets under all possible conditions to that student . In other words, while we care how well students perform on a single assessment, we care much more about whether the assessments provide credible evidence that the student really knows and can do what is being claimed about the full domain. Generalizability is a concern for all instruction and assessment programs, but is more of a concern for competency-based systems designed to declare that students have “mastered the competency” so they can move on without having to demonstrate mastery of the same competency again.
Sufficiency then is a judgment about having enough credible evidence to support the claims, uses, and decisions that result from assessments. Sufficiency refers to both the quantity and quality of assessment evidence. We outline several considerations to help educators and others think about the evidence necessary to support educational decisions.
Sufficiency Consideration #1: What Claims are You Trying to Support?
“Students who exceed the cutscore on this assessment have ‘mastered’ the knowledge and skills associated with Competency A” is an example of a claim that could be made using a classroom summative assessment. In the case of a local assessment system, comprised of many assessments, the claim could be that “students who exceed the mastery cutscore on most (or all) of these summative assessments are proficient in Grade 3 math.” These two claims are at different grain sizes, which have implications for the amount and quality of information necessary to support decisions related to these claims. More comprehensive (i.e., larger) units of analysis require more information because of a larger target of generalization.
Sufficiency Consideration #2: How do You Intend to Use the Results?
Claims flow directly into use. Formative assessments are used to provide feedback to teachers and students in the moment so they can adjust their teaching and learning, whereas summative classroom assessment results are used for grading, reporting, and competency determinations. The differences in these and other use cases must be factored into sufficiency evaluations.
Sufficiency Consideration #3: What Decisions are Made Based on the Assessment Results?
We need to be clear about the decisions that will be made from the assessment results because the sufficiency of information required is related to consequences associated with the decisions to be made based on the assessment results. For example, if the results of an assessment will determine whether or not a student is able to move on to the next class, users must have enough information to be able to defend almost all of these decisions.
Sufficiency Consideration #4: What is Your Tolerance for Being Wrong?
The results of all assessments contain error, which is inversely related to the amount of measurement information available. Therefore, it is critical to consider one’s tolerance for being wrong and in which direction. Specifically, would it be better to err on the side of thinking that a student has mastered the knowledge and skills associated with an identified learning target when they actually have not, or the converse? The tolerance for being wrong is closely related to the decisions and potential consequences.
Sufficiency Consideration #5: Is the Assessment Part of a System or a Single Assessment?
Coherently-designed assessment systems could provide more accurate and reliable information than a haphazard collection of assessments Therefore, systems should be design to maximize sufficiency while striving for efficiency..
Many still ask, “but how many assessments do I need?” Of course, the answer depends on the breadth and scope of these assessments, as well as the considerations discussed above. We find from our generalizability analyses of local assessments as part New Hampshire’s PACE pilot that our guidance is not that different than the recommendations from Shavelson, Baxter, and Pine (1992) of 6-12 performance assessments to achieve a stable estimate of a student’s performance. Yet, this endeavor quickly becomes a multiplication problem, especially in elementary grades when the same teacher would have to design, administer, and score all of the assessments.
Further, recommending a specific number of assessments is based only on an average. For example, if a student excels on the first few assessments, we probably need fewer assessments than for students close to the cutscore to determine if they have mastered the intended learning targets.
Concepts such as generalizability and sufficiency sound complex. The following suggestions are intended for those trying wrestle with these issues in practice:
- Be clear about the intended uses.
If your focus is on formative feedback, don’t worry about sufficiency too much. If you are making determinations about students’ “competence” of specific learning targets, you need to attend to these considerations.
- Be clear about your claims, especially your transfer/generalizability claims.
If you want to claim that student competence extends beyond the performance on the single assessment or set of assessments, then carefully evaluate whether the set of assessments adequately represents the target of your inferences (e.g., argumentative writing) and provides enough information to support your decisions.
- Be clear about your tolerance for being wrong.
The more concerned you are for being wrong (e.g., denying a student a chance to move on), the more important that you have sufficient information to support the decision.
- Carefully balance having too little information with having too much,
This balance is especially important if having too much information is coming from assessments administered separate from instruction.
We hope this discussion of sufficiency provides an initial framework for educators and educational leaders designing their local assessment systems to evaluate competencies or other important learning goals.
Brennan, R. L. (1992). Elements of Generalizability Theory. Iowa City, IA: American College Testing Program.
Cronbach, L. J., Linn, R. L., Brennan, R. L., & Haertel, E. (1997). Generalizability analysis for performance assessments of student achievement or school effectiveness. Educational and Psychological Measurement, 57, 373–399.
Domaleski, C., Gong, B., Hess, K., Marion, S., Curl, C., & Pelzman, A. (2015). Assessment to support competency-based pathways. Washington, DC: Achieve.
Evans, C. M., & Lyons, S. (2017a). Application of generalizability theory to classroom assessments in a school accountability context. Paper presented at the annual meeting of the National Council for Measurement in Education. San Antonio, TX.
Evans, C. M., & Lyons, S. (2017b). Comparability in balanced assessment systems for state accountability. Educational Measurement: Issues and Practice. https://doi.org/http://dx.doi.org/10.1111/emip.12152
Marion, S., & Leather, P. (2015). Assessment and accountability to support meaningful learning. Education Policy Analysis Archives, 23(9). Retrieved from http://dx.doi.org/10.14507/epaa.v23.1984
McClarty, K. L., & Gaertner, M. N. (2015). Measuring mastery: Best practices for assessment in competency-based education. Washington, DC: American Enterprise Institute.
Shavelson, R. J., Baxter, G. P., & Pine, J. (1992). Performance assessments: Political rhetoric and measurement reality. Educational Researcher, 21(4), 22–27.
Shepard, L. A., Penuel, W. R., & Pellegrino, J. W. (2018). Using learning and motivation theories to coherently link formative assessment, grading practices, and large-scale assessment. Educational Measurement: Issues and Practice, 37(1), 21–34.
Wilson, M. (2018). Making measurement important for education: The crucial role of classroom assessment. Educational Measurement: Issues and Practice, 37(1), 5–20.