Five Essential Features of Assessment for Learning

A Look at Fulfilling Hopes for Innovative Assessments and Instructional Utility

It may sound innovative to claim that commercial interim assessments support instruction, but simply saying it doesn’t make it so. 

The Innovative Assessment Demonstration Authority (IADA) under the Every Student Succeeds Act (ESSA) was created to allow states and school districts the chance to use assessments in support of richer learning opportunities, including those tied to identified competencies. The IADA is not restricted to innovations in support of learning directly. Some states might pursue new ways to provide accountability scores, to use game-based assessments to better engage students in the summative assessment, or to provide information on key cognitive processes in addition to performance outcomes. 

That said, I focus my remarks on assessments designed to support the learning of the students participating in those assessments. Regular readers of our CenterLine blog know that assessments must be designed and validated for specific purposes and uses. I am motivated to write this post out of concern that commercial interim assessments, generally not administered more than two or three times per year, are being proposed as “innovative” assessment systems in support of instruction and learning.

What Must We Accomplish for an Assessment to Truly Support Learning and Instruction?

Assessment for learning is a commonly-used phrase to describe the idea that assessments can be employed in service of learning rather than simply measuring what students have learned. However, simply naming an assessment as being for learning does not make it so. We must carefully unpack what it means for an assessment to support learning and instruction. This is where a theory of action can be most critical for informing our thinking. Having employed such theories of actions in multiple settings, I suggest the following assessment features can support instructional utility. There are likely other things that can be added to this list, but I believe these qualities are critical for an assessment to be called an assessment for learning

  1. Coherence With the Enacted Curriculum. Assessments must be tied to the specific curriculum and/or learning progressions used to guide instruction. Standards are too distal (i.e., end-of-year targets) and generally too vague to guide instruction. Therefore, if assessments provide feedback only relative to state content standards, it is unlikely teachers will be able to do more than some general re-teaching. State content standards define end-of-year expectations, but do little to describe all of the knowledge and skills that lead to these large-scale content standards. Improving learning requires task-specific feedback. Assessments tied to state standards are not specific enough to guide such feedback. It is even worse when the assessments are not even tied to the state’s specific content standards.
  2. Items and Tasks that Support Deeper Thinking. If our goal is to create an assessment system to both measure and support deeper thinking, we must ensure our test questions elicit the evidence of the complex thinking we desire for students. This objective is especially important in competency-based education systems where students are expected to demonstrate their learning through rich performances. Multiple-choice items alone cannot meet this requirement. Further, test items range in quality considerably, so it should go without saying that high-quality items must be employed to support rich instructional initiatives.
  3. Results that are at the Right Grain Size to Support Useful Feedback. Following from the first criterion, the results must be presented at a grain size at which teachers and students can take action. Telling a student they are weaker in argumentative compared to narrative writing might be somewhat useful, but not nearly as useful as letting a student know their thesis statement does not properly outline the forthcoming argument. This second case is at a grain size from which students and teachers can act to change performance.
  4. Results that are Timely. We know from the formative assessment literature that feedback is best when it occurs soon after or even during the performance. The speed of returning results is related to the grain size and the connection with the enacted curriculum. Most people enjoy the instant gratification of getting mid-year or end-of-year test results back quickly, but does it really make a difference if the results point out remediation needs from instruction that occurred months ago? Sure, it might be important, but waiting a week or two would not really make a difference. On the other hand, finding out in the middle of a unit that a student is struggling with a key concept before misconceptions get solidified allows for students and teachers to take improvement actions.
  5. Results that Inform Instruction. The formative assessment literature suggests the results must be presented in a way that teachers and students understand what to do next. Scores and subscores tied to distal content standards do not carry enough meaning to guide instructional moves. I have long questioned whether teachers understand how much harder or better to teach to raise a child’s score twelve points, for example, on some scale they do not understand. On the other hand, there is little question that close examination of student work is one of the best ways to help teachers and students understand the strengths and shortcomings of a particular piece of work.

I fully support state and district efforts to innovate their assessment systems in service of more meaningful learning and assessment experiences for students. The five characteristics of assessments offered above are a starting point for evaluating the degree to which specific assessments proposed as part of the IADA may be able to fulfill hopes for instructional utility. I think a fair reading of these qualities against the characteristics of most commercially-available interim assessments suggests they would fall short on all qualities except the timeliness of the results. If such assessments are being proposed as innovative and instructionally useful, I would like to see the evidence tied to a defensible theory of action in order to support such claims.