The Need for Program Evaluation to Support Accountability Implementation


The Need for Program Evaluation to Support Accountability Implementation

How Summer Grilling Illustrates the Differences Between Formative and Summative Evaluation

When I last discussed the need for program evaluation to support accountability implementation, I used the unwelcome issue of car trouble to frame the need for evaluation when designing, developing, and implementing accountability systems. Now, with temperatures dropping as we move through fall, I want to reference a pleasing summer activity–outdoor grilling–to differentiate between formative and summative evaluation. 

Evaluation has been defined as judging the merit of something (Scriven, 1991) or a systematic process that determines the worth of a strategy in a specific context (Guskey, 2000). Both of these definitions require articulating the purpose and use of the thing we’re evaluating. 


Are We Monitoring the Process, the Outcome, or Both?

Grilling is an exercise in informed observation. I remember the first time I had the opportunity to manage the grill. I was 10 years old, and it was exciting and intimidating. Grilling while hosting is certainly easier now than when I was 10, but it still requires active and informed observation. You don’t just take a bite of a guest’s steak or chicken to make sure it’s prepared correctly, right? You make sure the grill is clean and has enough fuel, and you determine whether you need a medium heat or a high sear. While cooking you check the heat regularly, flip the food (enough, but not too much), potentially cut into the meat to double check its temperature along the way, and determine when to best serve it (note: the notion of letting meat “rest” is, in fact, not a firm truth but actually a longstanding myth). 

Believe it or not, these activities during the process of grilling are similar to  different types of program evaluation. 

Monitoring the preparation, the cooking conditions, and techniques that improve the process to maximize tastiness is formative evaluation. 

Tasting the finished product and making a final judgement regarding the quality of the food’s preparation is the summative evaluation.


What is the Reason for Our Evaluation Effort?

Monitoring and evaluating systems or programs require the same distinction. If we think about program evaluation as the collection, use, and interpretation of information to improve a system or make a judgment about its impact, then we can further delineate between formative and summative evaluation in the following ways: 

  1. Formative: Evaluation to improve the design, development, or implementation of a program or effort.
  2. Summative: Evaluation intended to make a retrospective judgment about a program or effort.

Validity arguments in support of an accountability system are dependent on a comprehensive set of coherent evidence (AERA, APA, & NCME, 2014) and must be evaluated in light of the system’s intended outcomes, purposes, and uses (see D’Brot, Keng, & Landl, 2018). Both formative and summative approaches are useful when evaluating accountability systems and contribute to validity arguments in different ways. Accountability designers will want to use formative evaluation to improve the system with evidence related to functioning of each of its constituent components. However, a summative evaluation of the quality of the overall system will require linking evidence to claims associated with both the whole and the parts of the accountability system. 


Evaluation is a Systematic Process

One way in which system designers can be aided in conducting an objective evaluation of their own systems is by using a systematic process in advance of development and evaluation. Most complex systems are made of components that must interact coherently and support one another. By recognizing opportunities for formative evaluation that also support summative evaluation, system designers can more efficiently collect evidence or make any necessary system revisions. These opportunities require us to identify intended claims (i.e., assertions about the way in which components should act or interact with and within the system) in order to check assumptions within and across components, and identify the evidence to support that claim. Thus, confirming (or refuting) a component’s claim (i.e., formative evaluation focus) will contribute to the overall collection of evidence across claims to support the accountability system’s validity argument (i.e., summative evaluation focus). 

The following steps are proposed to help systematize evaluation efforts to ensure accountability systems are meeting their intended goals, purposes, and uses (see D’Brot (2018) for more specific examples of claims, assumptions, and evidence). 


  1. Identify the components of the system. For example,  activities associated with the design, development, and implementation of the system.
  2. Determine the specific claim being made for each component that is aligned with the overall system. For example, overall school identification differentiates strongly between high performing/high growing schools and high performing/low growing schools; school-level measures of career-readiness and college-readiness contribute significantly to ratings of school quality.
  3. Identify the assumptions for each claim that must hold to support the claim. For example,  data collection procedures are not compromised or gamed; data elements in the system relate to others as intended; the intended emphasis of indicators within the accountability system, like student growth, are reflected sufficiently in how school are evaluated.
  4. Identify the evidence necessary for each assumption to support the component claim. For example, collected information, results of analyses, application of appropriate methods.
  5. Compile evidence in an organized and coherent manner. For example, through a series of systematic qualitative and/or quantitative analyses.
  6. Examine evidence across components and claims to collect evidence in support of the validity argument for the system as a whole.


While steps 1-5 are all in support of step 6, each step is dependent upon defining the intended goals, uses, and interpretation of the overall system–and each component that it comprises. By taking a bottom-up approach that considers both component and overall claims, designers can engage in formative evaluation to improve their systems while simultaneously collecting evidence to make a judgement of overall system quality along the way. 

After all, what a shame it would be for a great steak to go to waste because our expertly–designed process for grilling it failed. Sometimes, even the most routine components have to be examined to ensure the overall system works. 


AERA, APA, & NCME (2014). Part 1: Validity. Standards for Educational and Psychological Testing. Washington, DC: AERA.
D’Brot (2018). A framework to monitor and evaluate accountability system efforts. Dover, NH: Center for Assessment.
Keng, L., D’Brot, J., & Landl, E. (2018). Accountability identification is only the beginning: Monitoring and evaluating accountability results and implementation. Washington, DC: CCSSO.
Gusky, T.R. (2000). Evaluating professional development. Thousand Oaks, CA: Corwin Press, Inc
Scriven, M. (1991). Evaluation Thesaurus (4th eds.). Newbury Park, CA: Sage.


Prev Next