Reflections on Large-Scale State Assessment in the Twenty-Tens

What Have We Learned and What’s Ahead? Looking Back at a Transformative Decade in Educational Assessment

In considering a decade of large-scale state assessment, I’m reminded of a moment in June 2010. while attending the Council of Chief State School Officers (CCSSO) National Conference on Student Assessment (NCSA) in Detroit, Michigan, I was part of a crowd gathered in the hotel lounge watching the U.S. Men’s National  Team (USMNT) play a World Cup soccer match against Algeria.  

Such is the cachet of the World Cup that otherwise indifferent observers become, if only for a moment, superfans. Since I’m already a superfan, I was a nervous wreck. A scoreless draw would eliminate the U.S. from the tournament. The minutes ticked by with no goal and the match went into extra time.

Then IT happened.  

Landon Donovan raced down the field, collected a deflected shot, and buried the winning goal in the final moments of the match. The USMNT players exuberantly piled-up on the field. The brilliant broadcaster Ian Darke exclaimed, “You couldn’t write a script like this!” The crowd at the hotel in Detroit erupted in celebration. In fact, all over the country fans chanted U-S-A as they celebrated this result. What a moment!

But the moment was fleeting. The USMNT would be promptly eliminated in the next match. They wouldn’t fare better in the 2014 World Cup. 

Fast forward to October 10, 2017, which was probably the nadir for modern U.S. soccer. I sat alone that day in a hotel room in Jackson, Mississippi and watched as the USMNT failed to qualify for the 2018 World Cup. The moment was quite a stark contrast from that celebration in Detroit many years earlier.      

There’s been a lot of hand-wringing since that loss. Soccer pundits fumed. Calls for wholesale changes in U.S. soccer emerged that were probably long overdue. Many are hopeful these changes will lay the groundwork for longer-term success. Okay, it’s a little corny, but I can’t help but think of the adage, “Sometimes a setback is a setup for a comeback.”  

This attention to the state of U.S. soccer seems like a suitable metaphor to frame my thinking about the decade of assessment.      

How so? Like U.S. soccer in the twenty-tens, developments in assessment followed a similar pattern: an exuberant start, setbacks, and, finally, renewed optimism and clarity about the road ahead.

An Exuberant Start – The Common Core and Next-Generation Assessments

My assessment colleagues and I had reasons to be enthusiastic in 2010. The outlook for assessment was bright. The previous year, the United States Department of Education (USED) announced funding to develop “next generation” tests through the Race to the Top (RTT) program. That very month, the Common Core State Standards (CCSS) had been released, and by the end of the summer would be adopted by all but a handful of states. In fact, less than three months after that June NCSA meeting, then-U.S. Secretary of Education Arne Duncan would award $330 million to fund two assessment consortia. At that time, fully 44 states were members of at least one consortium.  

It’s hard to describe the enthusiasm and high expectations for assessment in 2010. At the time, all the talk was about moving away from superficial “bubble tests” and measuring what really matters. If someone had asked me only a year or two prior if this shift was likely, I might have replied, “You couldn’t write a script like this.”  

In those expectant days, there was no shortage of optimism that these tests and standards would be an essential part of a long-awaited national education reform. That hope was based on two priorities.  

  • First, the content and rigor of the tests had to reflect the skills required to be prepared for post-secondary success.  
  • Second, the tests had to produce comparable results across states.   

This view was articulated by  Secretary Duncan, speaking to state leaders at Achieve’s American Diploma Project shortly following the grant award: “We will ask the two consortia to collaborate to make results comparable across the state consortium.” He reasoned, “Transparency, and the honest dialogue it will create, will drive school reform to a whole new level.” 

Expressions such as, “tests worth teaching to,” “truth in advertising” and closing the “honesty gap” captured the mood at the time.   

The expectation that big change was coming was further amplified when in September 2011, President Obama announced flexibility for what were seen as increasingly oppressive No Child Left Behind (NCLB) mandates through a series of waivers. In order to be eligible for those waivers, states had to meet three core principles: 

1) demonstrate that it had college- and career-ready expectations for all students;

2) develop accountability systems that valued progress as well as achievement; and

3) adopt multiple measures to evaluate teacher and leader effectiveness that included student performance. 


In the midst of all this enthusiasm, my colleague Charlie DePascale wrote a sobering commentary for EdWeek in August of 2011 titled, Salvaging RTT Assessment 

Wait, what? Salvaging? Things seemed to be going great. Almost every state was in one if not both consortia at the time and optimism was soaring. 

Turns out, Charlie was prescient.   

In his essay, Charlie warned that the success of the reforms that inspired the RTT program would require more clearly-defined outcomes and investments to support sustained implementation. Then, he delivered the key point in a way I would not fully appreciate until later, “Beyond that lack of a clear goal and the massive complexity of the charge at hand, the most serious threat to the success of the RTT assessment effort may be the danger of viewing the entire program through the lens of large-scale assessment.”  

In retrospect, we know that’s exactly what happened.  As the weight of the policy initiatives and reform efforts mounted, the state summative tests, laden with impossible expectations, became the most conspicuous target. Questions emerged about the CCSS, federal overreach, data privacy, and the merits of accountability initiatives, especially educator evaluation.  

Ultimately, many of these issues became politicized, and by the middle of the decade, the opt-out movement was already gaining steam. In 2015, when the brand new next-generation tests were introduced, as many as 20% of students in some states refused to participate.  

Moreover, due to a lack of support from state leaders, cross-state comparisons of test scores across states never materialized.  

Adding to the policy and political concerns were a host of practical concerns.  In an attempt to make sure the new tests were credible measures of post-secondary readiness, the initial consortia assessments were much longer, more burdensome to administer, and in some cases more expensive than the ones they replaced.  

Also, simmering worries about local control surfaced. It was easier for state leaders to support the abstract idea of working with other states when the tough decisions were too distant to provoke dispute. But when the time came to make final decisions about things like accommodations and accessibility features, scoring procedures, administration time, report designs, and much more, the already-frayed connections began to unravel.   

Most states ‘officially’ moved on from the CCSS and adopted new state standards; although many very closely resemble the CCSS even now.  

As for the consortia assessments, the number of students taking those has diminished considerably. Approximately sixteen states still administer one of the consortia assessments – most of them Smarter Balanced. The PARCC consortium has dissolved. Only a few states use the former PARCC ‘flagship’ form, or leverage PARCC content now managed by New Meridian with alternative blueprint options.  

Most states work with a contractor to develop their own summative test in ELA and mathematics, as was the case before RTT.  

The Every Student Succeeds Act (ESSA) replaced NCLB late in 2015. Although there was the promise of some flexibility or opportunity to innovate, the core requirements for state assessment and accountability remained largely unchanged from NCLB.

I don’t think it’s unfair to say that the optimism of 2010 about the impact of the RTT consortia didn’t play out – at least not in the way that was anticipated. 

Renewed Optimism – Better Tests and Bold New Directions in Assessment and Education Reform 

It’s only fair to acknowledge that while the RTT program was not successful in every way; it was successful in many ways.  

  • in my judgment, the tests developed by these consortia represent a substantial improvement from the state assessments they replaced.  
  • in almost every instance,  states dropped their legacy assessment and pursued a new test that entirely or mostly embraced the same goals and set markedly more rigorous expectations.  

However, I think the most important development that has emerged since the RTT setback is the increased emphasis on comprehensive and balanced assessment systems.  

While much work remains in this area, my colleagues have written extensively about these topics (see e.g. this paper from Marion et al from the Center’s 2018 RILS event) and are actively working with states to stand up more thoughtfully designed, lasting models that consider a more complete and credible theory of action for improvement. Additionally, states are paying more attention to the role of formative and interim measures in an assessment system and the interrelationships among the various components, 

As the decade draws to a close, there’s no doubt the assessment field has emerged with some wounds. But I hope we are wiser for the experience and better equipped for success moving forward. If we learn the lessons of this decade, we should look to the next with a sense of humility about the limits of summative assessment and think more broadly about the essential connections among assessment, curriculum, and instruction. Finally, we must attend to capacity building in each area if we are serious about sustainable and scalable reform.   

Here’s hoping the twenty-twenties bring more innovations and improvements in assessment.

And I wouldn’t mind a good result for the U.S. soccer team at the 2022 World Cup, either.