Guidance for Addressing N-Size in School Accountability and Reporting Systems
Small Details Can Make a Big Difference in Accountability and Reporting Systems
The common idiom “the devil is in the details” certainly applies to the design and implementation of school accountability systems.
This expression reminds us that little things can make a big difference. In this post, I address one of the seemingly minor issues that matters a lot: n-size.
What is n-size? For our purposes, n-size refers to the minimum number (n) of students required to report an outcome, such as proficiency or graduation rate. N-size may not be top of mind when you think about all the important decisions that go into designing and implementing an educational accountability or reporting system. But I can assure you, the unassuming n-size carries a lot of weight. Stick with me.
Considerations When Determining Minimum N-size for School Accountability Systems
It may surprise readers that there isn’t a standard practice when it comes to setting the minimum n-size for contemporary school accountability systems. The n-size used in most state systems varies from as low as 5 to as high as 30. The minimum n-size decision is often influenced by at least three factors: privacy, reliability, and inclusion. The truth is, there isn’t a single right number for any system. What’s more, addressing these considerations leads one in different and competing directions. I’ll explain.
How Ensuring the Privacy of Test Results Influences a Minimum N-Size for Reporting
Privacy refers to the need to establish a sufficient minimum n-size for reporting so that an individual students’ results are not identifiable. This consideration becomes particularly important when reporting outcomes for student groups. The Every Student Succeeds Act (ESSA) requires that states report outcomes for race/ethnicity, economic disadvantaged status (ED), students with disabilities (SWD), and English learners (EL). While group sizes are often sufficient at the school level, maintaining confidentiality gets trickier when reporting for a combination of factors such as SWD by grade or EL by race/ethnicity.
There are a range of ways that states protect the confidentiality of student data in reporting. One approach is to use suppression rules to mask outcomes when group sizes are too low. A related strategy is to group results into broader categories to avoid reporting that all or no students attained an outcome (e.g., less than 5% or greater than 90%). A full discussion of strategies to protect the confidentiality of data is beyond the scope of this piece. Suffice it to say establishing a sufficient minimum n-size is an important part of an overall plan to protect the privacy of publicly reported student data.
How the Reliability of Accountability Indicators Influences N-Size Based on Intended Interpretations/Uses
Reliability refers to the extent to which the data are sufficiently precise or trustworthy for the intended interpretation or use. One may have concluded that a minimum group size of, say, 5 is adequate to protect the confidentiality of publicly reported data, but be concerned that 5 students isn’t a sufficient number to accurately represent the performance of a student group. This concern is elevated when there is an accountability consequence associated with the outcome. But what if there are only 5 students at a school in a particular group?
In technical terms, this concern is related to ‘sampling error.’ If accountability systems are built on the assumption that data from a group of students at a school represent a sample from a broader population of students who could be enrolled in that school, it’s worth understanding the error associated with that sample. For a more complete treatment of the issue and impact of sampling error, I recommend Rich Hill and Charlie DePascale’s seminal 2003 paper.
The key point is this: as n-size increases, sampling error decreases. I’ll spare readers the formula, but the table below shows how the standard error of a proportion – in this case, a hypothetical proficiency rate of 60% – is influenced by n- size.
Look what happens when we use that standard error to create confidence intervals around our 60% proficiency rate. By adding and subtracting two standard errors, which is associated with 95% confidence the real or true value is in the range, we get a spread of 18.2% to 100% for a sample of 5. That’s nearly the entire range. The range reduces to 42.2% to 77.8% for a sample of 30.
To be fair, there are those that dispute this application of sampling theory for school accountability. They might claim there isn’t a hypothetical underlying population to which we are making an inference. Rather, the enrolled students at the school ARE the full population. I’ll save that debate for another paper, but it doesn’t change the point that data from larger group sizes are more trustworthy.
How Inclusion is Impacted by N-Size
Inclusion refers to the desire to include as many schools and groups as is practicable. As n-size increases, inclusion decreases and vice versa. Many advocates regard this as an equity issue, arguing that n-sizes must be set as low as possible in order to promote transparency and accountability for as many schools and groups as possible.
Obviously, there is a tension between lowering n-size to maximize inclusion and raising n-size to safeguard privacy and promote reliability.
As a starting point to inform discussions about inclusion, it’s useful to evaluate the impact of various n-size decisions by student group. The figure below, created with hypothetical data, illustrates one approach.
This graph shows the percent of schools that would be included on the vertical axis for different n-size choices on the horizontal axis. The different color lines show the impact for all students and for selected student groups. It’s useful to examine these displays for an ‘inflection point’ or point at which the impact magnifies to a greater degree. For example, in the graph above, the impact of increasing n-size from 15 to 20 appears to sharply reduce inclusion for many groups.
Guidelines for Deciding on a Minimum N-Size
As noted previously, there is no single right answer when it comes to establishing n-size. However, there are some guidelines to consider that can help negotiate the tradeoffs and accompany a thoughtful decision-making process.
- Consider the context. High-stakes decisions that are solely or largely influenced by the performance of a single group on a single indicator require greater precision, which suggests that a higher n-size should be prioritized. However, a lower n-size may be defensible when the stakes are less consequential and/or are informed by multiple sources of information.
- Differentiate as appropriate. There’s no reason that n-size rules have to be ‘one size fits all’ throughout the entire system. For example, a state or district can adopt one n-size rule for use with more consequential accountability decisions and another for public reporting.
- Explore supplemental strategies to augment inclusion. Lowering n-size is not the only way to increase inclusion. One strategy is multi-year aggregation, which involves bolstering n-size, and increasing inclusion, by drawing on data collected in previous years. Data can be aggregated by simple averaging or creating a weighted composite (e.g., treating recent data as more influential). Another strategy is to create additional student groups based on policy priorities, such as measuring growth separately for students who are among the lowest performing 25% in the school.
- Evaluate and monitor. Finally, any thoughtful approach should be accompanied by systematic evaluation and ongoing monitoring. There are many more approaches to examining factors such as reliability and inclusion than are covered in this blog. States and districts should work with technical advisors to establish a program of research and review. For example, does the evidence support inferences about school quality? Are supports effective? An evaluation and monitoring plan that includes attention to the role of n-size will help leaders ensure the decisions are appropriate and defensible given the context and policy priorities of the agency.
Final Thoughts and a Different Perspective on Accountability Indicators
There’s another version of the idiom I shared at the start of this post: “God is in the details.” I prefer that one. It shifts the focus from the details as a source of trouble or chaos to a perspective where the details can come together to create something sublime.
Using words like “sublime” to refer to accountability and reporting systems is probably a stretch. But I contend that careful attention to the many seemingly small but important decisions, such as n-size, can make a big difference in the final product.