Recommendations for Good Practices in Developing and Codifying State Assessment Policy

This is the fourth in a series of CenterLine posts by our 2019 summer interns and their Center mentors based on their project and the assessment and accountability issues they addressed this summer. Zachary Feldberg, from the University of Georgia, worked with Scott Marion on a systematic review of states’ large-scale educational assessment policies.

I remember my teaching days when I had to proctor state-mandated tests and prepare my students to succeed on those tests. When you add that time to the resources spent on test development and the efforts of school administrators and state department officials to execute the tests, the picture of a huge enterprise unfolds. 

Each decision in this process is guided directly or indirectly by education policy. Therefore, we must ensure we have quality educational assessment policy if we want high-quality assessments. While the measurement community may have a strong understanding of good assessment, there is a dearth of knowledge about assessment policy, much less what makes it good. 

This summer, as an intern at The National Center for the Improvement of Educational Assessment (Center for Assessment), I reviewed state educational assessment codes, developed a framework for analyzing state statutes, and applied principles for evaluating assessment policy. My goals were to describe the range of state assessment policies, develop criteria for coding and evaluating these policies, and eventually produce recommendations regarding the types of policies that could support high-quality assessments and assessment systems. I share some of what I learned in this post.

Describing a Range of State Assessment Policy 

The various layers and sources of policy make for a complex web of laws that guide our education system. Policy often develops over the course of years. Sometimes, laws are passed that establish entirely new titles or sections of code, while other times laws change slowly as successive legislatures tweak the statutes bit by bit. Policy also changes on the peripheries; at regular state board of education meetings or when a department of education publishes guidance documents.  

Good policy for one state might not be appropriate for another state due to a range of contexts and governance factors. However, some policies may be better than others and some comparisons are possible. The principles provided below are meant to be a framework for such a comparison. The interaction among many of the principles reflects the complexity of educational policy. Policymakers will have to evaluate various tradeoffs as they pursue their goals and try to determine what is right for their constituencies. In spite of the challenges of prescribing what constitutes high-quality policy for any specific state, I offer below general recommendations focused around coherence, stability, specificity, and appropriate use.

Achieve Coherence in a World of Past and Competing Objectives

State policy is influenced by both history and competing objectives. Current code sometimes develops by tweaking or adding to old code so the old codes and the purposes for which they were written still influence new codes. 

For example, let’s say one state has code that requires national norm-referenced tests that are also criterion-referenced and aligned to state standards for standards-based accountability testing, creating a walk down memory lane from the 1970s to the present. That same state also has code requiring performance-based tests of achievement, potentially contradicting the other requirements. How should a state assessment director develop or find a test that meets all those requirements? Under the framework I’ve developed, this example illustrates what I call dissonance or the tension between competing directives. Good policy avoids dissonance and promotes coherence. 

Establish Stability Among Assessment Systems

Stability is not always apparent in the code because it relates to the change of the code over time. As discussed here by Scott Marion, good assessment systems take time to develop, and we should strive for stability when possible. Policymakers need to recognize the trade-off incurred by changing and altering assessment systems, such as a loss of comparability and educator buy-in. Codes that promote stability by, for example, providing a longer time for assessment implementation, could go a long way in promoting stability. Good policy promotes stability.

Provide Balanced Specificity Within Policy Guidelines

State codes varied considerably in the degree of policy specificity. Some codes stipulate the grades, content, and times of assessments, while others just require, quite generally, that a test be developed and used. Whether specific or general code makes better policy differs based on the state, capacities of the agencies charged with developing the assessments, and purposes of the assessments. What is most important for policymakers to recognize is that, like with stability, there are trade-offs associated with each decision. For example, many states are feeling pressure to reduce testing time because of the over-testing movement. One state specifies the maximum length of time students can be tested, but also requires the use of test results for high-stakes decisions. Specifying a time limit may influence the reliability and validity of the tests, raising concerns for using the results in high-stakes decisions. I argue that general code promoting the right people having the right resources to make the best decisions is better than highly-specific mandates that may create dissonance, limit flexibility, and tie assessment leaders’ hands. Good policy creates a balance of specificity and generality. 

Determine an Appropriate Use for Student Scores

There is a tremendous turnover of state education leaders and other government officials, which can mean a loss of institutional assessment literacy required for developing high-quality educational assessment policy. See my fellow intern Brittney Hernandez’s post on promoting assessment literacy among policymakers. Even with these assessment literacy challenges, education codes can be written to safeguard against deficits in assessment literacy. For example, one state’s code reads that individual student scores should only be reported and used “for assessments that produce valid individual pupil results.” The code could simply have required reporting results without stipulating “validity,” as other states do. Instead, the qualification enshrines an element of assessment criteria and literacy in the code. Good policy can promote assessment literacy and appropriate use. 

This post describes my initial work at the Center for Assessment to better understand the dimensions of assessment policy. Subsequent work with the Center and a group of state assessment directors involves a full analysis of a subset of states through the lens of the criteria outlined above and eventually providing model policies to help state leaders as they grapple with these tricky issues.


