Making a Bad Situation Worse

Poor Grading Practices Lead to Negative Consequences

The recent controversy over the awarding of the valedictorian and salutatorian at West Point High School in Mississippi highlighted long-documented problems with grading practices. In case you missed it, you can read accounts of the story here and here. I could write a lengthy post about how the district mishandled this situation and likely racist factors at play, but I focus my remarks on how poor grading practices led to this mess in the first place. 

Two major grading issues surfaced in this story. The first is that the district used a “100-point” scale to calculate its semester grades and a 0-4 scale to calculate its “quality point average” (QPA). Not surprisingly, the rank ordering of the students is not the same across the two approaches. As I wrote in my previous grading post, the meaning of the scales associated with grades, whether a 0-100 or a 0-4 scale, is hidden behind the veil of numerical objectivity. The 0-100 scale is even worse because it is not really a 100-point scale and suffers from the myth of precision. 

The second major problem is the use of weighted grade point averages (GPAs). Weighted GPAs are designed to incentivize and reward students for taking more rigorous courses instead of shopping around for classes with “easy A’s.”  But which courses deserve these additional “quality” points? In some schools, it is just Advanced Placement (AP) courses, but in other schools, honors classes may come with additional points, sometimes as much as a full point on a four-point scale (e.g., 5.0 instead of 4.0 for an A). 

The Lack of Consistent Meaning in Course Grades

Weighted GPAs operate on a logic similar to rewarding Olympic divers with higher multipliers for successfully completing more challenging maneuvers. A critical difference between diving and grading, however, is that panels of experts have determined the specific points associated with each dive, such that the full set of possible dives exist on a difficulty continuum. In other words, there is a shared and transparent understanding of diving rigor. In contrast, the requirements to earn an A or 90%, a pseudo-common scale, in a course are typically determined by the teacher of that particular course. There is no shared or transparent understanding of the meaning and rigor associated with earning an A, B, or C in a particular course.  

Part of the rationale behind “quality points” for honors/AP courses is both that the grading is more rigorous and the content more challenging. We have reasonable evidence to support the latter claim, but not so much regarding the former, in part because it is challenging to come up with fair counterfactuals to evaluate the rigor of grading practices. That said, every student knows there are teachers and courses associated with an easy A, as well as those where it is challenging just to earn a B. Yet the A in the easy course generates more points than the B in the hard course. Shouldn’t the “tough B” get more “quality points” than the easy A? If we used a “course difficulty” scale, like in diving, it would.

I know that the last suggestion will land like a lead balloon because teachers are resistant to having their grades questioned. Further, we pretend the symbols associated with grades have some underlying and consistent meaning, but they do not. We can borrow from the ways in which we establish cutscores on many summative assessments to address both of these issues.

Creating Shared Understanding of Student Expectations

Setting cutscores on large-scale assessments and certification exams generally begins with creating performance level descriptors (PLDs), which describe how well students must demonstrate their knowledge and skills in a specific domain (generally drawn from the content standards). These PLDs anchor discussions of cutscores by providing clear descriptions of what students at each level (e.g., proficient) know and are able to do. 

There’s no reason why schools/districts couldn’t do something similar for letter grades. These PLDs would have to be written at multiple levels to support the necessary coherence and consistency, such as global PLDs across content areas along with sets of PLDs within each major content area and ultimately for each course. These descriptors can help build a shared understanding of performance expectations along a grading scale, which is enhanced by tying student work examples to each descriptor. 

If PLDs seem unwieldy, school leaders can start by convening groups of teachers within and across grades and subjects to examine samples of student work from prototypical A and B students (perhaps C too). Teachers would then be able to see how their colleagues evaluate student work and award grades. This could also create a shared understanding of the meaning of grading scales and create a form of internal accountability because no teacher wants to be known as the one who gives out easy As.  

But how does this help the relative arbitrariness of awarding “quality points” for advanced courses? Conceptually, either the PLDs or student work approach should mean that an A is an A no matter the course. I’m not naïve enough to think this will happen easily, but if schools could move in this direction, they could avoid the “quality point” issue. 

On the other hand, that would likely mean that an A+ would be pegged to exemplary performance in the most rigorous courses; students in lower-level courses would have trouble earning As if they were not provided with opportunities to demonstrate advanced performance, making this approach politically dicey. However, going down this PLDs path would allow school leaders and educators to at least make more defensible decisions even if they decide to continue with a quality points approach. For example, schools could make more rational decisions about which courses deserve additional points and how many.

Equity Issues Result from Poor Grading Practices

I would be remiss if I avoided the equity issues associated with leveled courses. Far too many high schools have multiple strata of courses, each purportedly at different degrees of rigor, that effectively segregate students into more homogeneous classes. The more educationally sound approach is to provide all students with opportunities, preparation, and support to participate in rigorous classes, because, as my colleague Carla Evans noted, grading reforms are not a “magic elixir.”

In short, changes to grading practices must reflect and be the result of deeper improvements in curriculum, instruction, and assessment to have any effect on improving equity and student achievement outcomes in schools

That said, grading practices are incredibly consequential, especially in light of the push to eliminate the SAT and ACT from college admissions decisions. Consider the progressive admissions system at the University of Texas at Austin that admits all Texas high school students in the top 6% of their high school classes. The quality of grading practices has real consequences when there are so few coveted spots at this elite state university. Further, being named valedictorian could open important scholarship opportunities or other doors to the future.

Coming back to West Point High School, the two Black students with the highest weighted GPAs were recognized for taking and succeeding in rigorous courses. The inconsistency in West Point’s grading policy, conflated with weak grading practices, led to this sad and embarrassing situation. Unfortunately, these issues are far too common, even if they do not always carry racial overtones.