Supporting Assessment System Audits | Center for Assessment

Adapting Toolkits to Support Assessment System Audits

Many local and state education agencies strive to develop assessment systems that can be considered “balanced” along a variety of design dimensions. As my colleagues at the Center have discussed, the process of producing a balanced assessment system often requires resolving competing goals such as reducing testing time while simultaneously promoting deeper learning. In this blog post, I am sharing my experiences in adapting toolkits to help a local education agency conduct an assessment system audit and build a balanced assessment system.

The toolkit I created through this process is itself an adaptation of a set of materials developed by the Center called the District Assessment Design Toolkit. Despite this specific context, I believe that the strategies for adaptation I present are going to be useful across a wide variety of contexts since the underlying issue of mapping out an assessment space is a relatively common strategic problem.

Designing the Show

Balanced assessment systems should ideally be comprehensive, coherent, continuous, efficient, and useful. Getting to such a system is a complicated endeavor as the concept of “balancing” in this context is more akin to mixing music on a mixing board than it is about finding a simple equilibrium on a scale or balancing beam. Similar to that work, the resulting symphonic experience is more than the rational sum of its constituent elements. Its effects are a result of applying sound scientific practices (e.g., lighting design, sound engineering, stage design) to a professional creative endeavor. Let’s set the stage and briefly walk through a few considerations together.

Practically speaking, balancing an assessment system requires interdisciplinary, cross-departmental processes in which colleagues with a variety of backgrounds and experiences work together across vertical and horizontal layers of organizations. To support these processes, at some point key information about the current state of an assessment system should be succinctly summarized so effective strategic planning about future states can occur.

One helpful mechanism for this purpose is a formal audit of the current assessment system. However, just as there is no singular way of balancing an assessment system or composing music, there is no singular “correct” way of conducting an assessment audit. In the end, any existing off-the-shelf templates, guides, and processes have to be adapted so that they fit the information-gathering and insight-generating needs of the particular context in question.

Consistent with our open-source dissemination policy at the Center, the District Assessment Design Toolkit that I present in a moment can be freely shared and repurposed with appropriate attribution to the Center. These materials can hopefully serve as inspiration for you to develop your own adaptation and to share materials that you have found useful in your own work. In other words, take the best of both artistic worlds (or some other world) and make the materials your own!

Setting Up the Stage

There are two core artifacts in this set of materials.

The first artifact is a guide for auditors, which outlines the purpose of the audit, a few broader design parameters for the audit, and how the audit process should be implemented. The main section includes definitions of all of the columns (i.e., variables/design dimensions) in the audit template along with instructions for how to code individual assessment types or assessments using these design dimensions. Depending on the team that is working on the audit, these components could be expanded upon and modified. For example, one could make available an internal glossary with a variety of terms expressed at a level of complexity that matches the prior training and experience of the staff.

A variety of important design choices for the core audit are baked into the structure of the template and these instructions as well. These decisions include the kinds of design dimensions that are to be captured, the level of grain size at which information is to be encoded, and the particular way in which assessments are grouped/organized. These design decisions have to be made with consideration to elements such as coding time, required expertise to make judgments and consistency of information across teams for comparability.

For example, in the context in which this form was developed, it made the most sense to differentiate assessment types (e.g., differentiate unit tests from midterms) but not each individual assessment within each type (e.g., each unit test, each midterm) because they shared a large number of common design characteristics. However, it did make sense to break the information down by grade level even if doing so meant a certain degree of redundancy since grade-specific breakdowns of information were helpful for sense-making when looking at the results.

Some of the resulting design decisions can generally be made globally (e.g., the grade-level breakdown decision) but some decisions need to be made at the level of a particular team with a particular scope of responsibility (e.g., which assessment types should be distinguished for a course or content area).

In any form, there may also be some language that is specific to a particular use context. For example, this form uses the term “internally-developed” to describe assessments because the organization that conducted the audit developed its own curricular materials and curriculum-embedded assessments while also utilizing vendor-provided solutions and administering state- or federally-mandated assessments.

The second artifact is the template in which the information is recorded. In this context, we chose a simple spreadsheet, color-enhanced and organized into blocks, because this format was the easiest to use for the auditors. We could have set it up as a survey form as well, but doing so would have prevented some simpler copy-and-paste operations across rows. Again, both scientific and practical considerations drive the setup of the tool here.

Sound Check

Once the data for an audit are collected, there are a variety of ways in which the information can be summarized to gain insight into the current imbalance of the assessment system. As with other scientific research, it is critical to think through the architecture of this targeted, evidence-based story at the outset, since some analyses may be precluded if the data are not collected at an appropriate level of breakdown.

For example, if grade-specific breakdowns by assessment types and content areas are desired, then one has to make sure that content areas, grades, and assessment types are separately tracked. Similarly, if quantitative summaries of information such as counts, percentages, or ratings are desired, one has to think right at the outset about the appropriate metric for each design dimension of interest and whether these carry the appropriate meaning. For example, while it makes sense to record percentages of items with particular characteristics across assessment forms of varying length (e.g., the percentage of items targeted for future reuse), it may make more sense to count the specific number of essays or performance tasks since they are generally fewer in number.

Finally, as I noted briefly above, there is also the issue of grain size and complexity of human judgments that are not resolvable through merely pulling appropriate data, per se. For example, the most meaningful answer to the question about the curricular alignment of assessments is likely going to come from formal alignment studies and possibly complementary audits that capture a variety of qualitative information in more detail.

However, if a “quick overview” of certain aspects of the assessment system is desired, it may be helpful to include certain coarse-grained judgment questions on the audit form. Sometimes having a thinner slice of evidence about an important issue (e.g., reusability plans for assessments, or perceptions of item quality) is preferable to not having any information at all, especially if richer qualitative information takes more time and human resources to collect.

The Symphonic Experience

Putting on a good show that hits all the right notes is an incredibly complex endeavor. Whether storytelling happens through music or a professional audit, the experience is always the result of a blend of rigorous application of scientific principles, artfully applied by humans. As I discussed in this post, there are clearly a lot of design decisions to consider when conducting an audit.

If your organization is interested in designing, implementing, and evaluating an audit for their assessment system, you may find several resources available on the Center website to be helpful. We would of course also love to hear from you about how you told your story and engaged in this work – what kinds of tools have you been using in your local contexts and what methodological or practical lessons have you learned from conducting these audits?