How We Can Generate Actionable Student-Level Information from the Summative State Assessment While Honoring Their Design
This post is based on an invited presentation Charlie DePascale made at the nineteenth annual Maryland Assessment Research Center (MARC) conference at the University of Maryland on November 8, 2019.
“Our teachers are thrilled that the new summative state assessment is so much shorter. Now, what additional student scores can we report from it to help them improve instruction?”
I cannot count how many times I have heard some juxtaposition of those two sentiments as states struggle to reduce the testing burden on schools while meeting the demand to provide more actionable information to teachers about the knowledge and skills of individual students.
The concept of generating actionable student-level information from an end-of-year summative state assessment is difficult to grasp based simply on the timing of the state assessment. It should go without saying, although it is stated repeatedly, that results from an end-of-year assessment are not useful for improving the instruction or current-year achievement of students who have completed the school year and moved on to another grade level and teacher.
The purpose of the summative state assessment should also limit the expectations that it will produce detailed information about the knowledge and skills of individual students. The primary purpose of the state assessment is to provide an accurate estimate of the percentage of students in a school and its relevant subgroups who are meeting the achievement standards established by the state for that grade level. The primary use of that information since the 2002 adoption of No Child Left Behind (NCLB) has been to populate various indicators within the school accountability system.
How Did We Get to the Point Where Student-Level Subscores Are Expected?
Although aggregate school and subgroup results have been the primary focus of summative state assessments, there has also been a parallel focus on individual student scores. NCLB, and the Improving America’s School Act (IASA) before it, required the reporting of individual student scores. The advent of student growth scores in the early years of NCLB produced a need for individual student scale scores in addition to achievement-level classifications.
In the later years of NCLB, the reporting of individual student scores from state assessments also became the primary weapon to combat the honesty gap about student achievement.
None of these uses of summative state assessment results, however, required the reporting of detailed information about the knowledge and skills of individual students. Although many states did report basic information about student performance in major content domains in English language arts (e.g., reading, writing) and mathematics (e.g., numbers and operations, algebra, measurement) in terms of raw scores and local norms, the focus of reporting at the student level was on the overall score and achievement level.
The increased demand for detailed reporting of student-level subscores can be traced to 2014-15 and the confluence of three elements:
- the creation of criteria for high-quality assessments,
- the federally-financed creation of next-generation state assessments,
- and the revised focus of USED Peer Review.
There were, consequently, increased demands placed on students and schools in terms of the time required to administer the new state assessments and the level of student knowledge and skills those assessments required. These demands also contributed to the call for more detailed student-level information.
You Cannot Get Blood From a Stone
Despite the best and valiant efforts of states and their assessment contractors to report student-level subscores, the fact remains that summative state assessments in English language arts and mathematics are not designed to produce such scores. We can begin the argument about why state assessments are not designed to produce student-level subscores with such low-hanging fruit as:
- most state assessments are constructed using models based on the assumption that the assessment is measuring a single, unified construct, and
- the unfortunate fact that the demand for short tests that measure higher-order thinking skills has resulted in some summative state assessments that generate a lot of points on student responses to 30 or fewer questions spread across the entire set of content standards.
Beyond those facts, however, is the rarely-discussed premise that we have never really defined what it is that we are measuring on summative state assessments in English language arts and mathematics. Let’s look at what we know about what is being measured on state assessments:
- Each of the items can be mapped to challenging content standards adopted by the state.
- The assessment as a whole is aligned to those state content standards as defined by accepted measures of test alignment.
- The assessment produces scale scores that correlate well with external measures such as course-taking, grades, and tests in other grade levels.
- Those scale scores are used to produce classifications of student proficiency along a continuum defined by the combination of the median of content-based judgments of a group of educators and policy-driven impact data.
But what is this thing that we call proficiency in English language arts and mathematics?
- It does not reflect student mastery of the set of content standards.
- It does not reflect student mastery of any individual content standards.
- Because of the way most assessments are designed to meet alignment criteria, we cannot claim that it reflects a student’s ability to apply the set of content standards to novel problems and situations.
- Finally, we know that the classification of proficiency produced by the assessment is time and grade-level specific. That is, a test score that classifies a student’s performance as proficient at the end of fourth grade is different from the test score that would classify the student’s performance as proficient at the beginning of fourth grade on the same content or at the beginning of fifth grade on different content.
So, where does that leave us in terms of generating actionable student-level information from the summative state assessment?
Making the Most of the Models and Data We Have Available
If our goal is to produce information that helps educators interpret and use the scores that students receive on the state assessment, a good place to start is with the model used to construct the assessment. The set of unidimensional item response theory (IRT) models on which most state assessments are built is designed to produce a wealth of information about the items that an individual student at a particular level of proficiency will be able to answer correctly and those that they will not. Those items do not have to be the specific items the student encountered on the current test form. In fact, the information will be most useful and actionable if it covers much more than the knowledge and skills measured by the 30-40 items administered to the student.
If we combine the model-based information generated from the test with other relevant information about the student or schools, such as curricular or instructional program information, student practices, prior student achievement, or perhaps measures of non-cognitive skills, we can further refine the interpretation of the test score produced by the IRT models and student performance.
Yes, it is true that such detailed information will be based on probabilities and group means. It will not be precise, accurate, and specific information about the strengths and weaknesses of an individual student. But no single on-demand state assessment could ever promise to provide that level of information.
Rather than trying to invent new ways of reporting student-level subscores out of whole cloth, let’s dance with the one that brought us and use all of the information provided by our IRT models.