Ahead of the Curve: The Future of Classroom-Based Educational Accountability

Fifteen years ago, findings from a study in Tennessee (Sanders & Rivers, 1996) showed that if two young students were given different teachers–one of them higher performers and the other lower performers–their achievement levels would diverge by more than 50 percentile points within three years. This was followed by another study from Dallas (Bembry, et al., 1998) showing that the performance gap between students assigned three effective teachers in a row and those assigned three ineffective teachers in a row was 49 percent.

The negative impact of low-performing teachers is thought to be severe, especially in elementary school, where students who are placed with low performing teachers several years in a row suffer educational loss that is irreversible. Students exposed to ineffective teachers and become slow learners in their early years of schooling have little chance to recover the years that have been lost (Barber & Mourshed, 2007).

The conclusion appears to be unmistakable: It is naïve to assume that student achievement will improve just by making structural or administrative changes to schooling. The quality of an education system fundamentally depends upon the quality of its teachers. The only way to improve student achievement is to improve the quality of instruction. The right people need to become teachers and develop into effective instructors, while also ensuring that all students receive the best possible instruction.

Fundamentally, learning occurs in the classroom when students and teachers interact. Consequently, to improve learning requires improving the quality of that interaction. Successful interventions–coaching classroom practice, moving teacher training into the classroom, developing school leadership, enabling teachers to learn from one another, and delivering these interventions throughout the school system–achieve this purpose.

High performing school systems recognize that the only way to improve student achievement is to improve instruction. The quality of student achievement is basically the sum of the quality of instruction teachers deliver. Delivering quality instruction requires teachers to develop a sophisticated set of skills. In an average third-grade classroom, student achievement levels will often span five grade levels. Teachers need to assess the strengths and weaknesses of each and every student they teach so that they can select the instructional approach that is most appropriate in a given case and then deliver that specific instruction in an effective and efficient manner.

The first challenge is to define what great instruction looks like. Developing the curriculum and associated pedagogies is difficult and controversial from an educational perspective yet relatively more straightforward from a management perspective: the challenge is broadly one of finding the best educators and giving them the opportunity to debate and create a better curriculum and pedagogy.

The second part of the challenge of instruction is much more demanding from a management perspective: giving thousands of teachers–and sometimes tens of thousands of teachers–the capacity and knowledge to deliver great instruction reliably and consistently in circumstances that vary enormously from one classroom to the next. Recognizing this challenge, education systems need to focus their attention, resources and people on strategies to improve classroom instruction, because the only way to improve outcomes is to improve instruction. The reform needs to be taken inside the classroom, where inspired teachers will work with inspired students.

Unlike other professionals who work in teams, teachers usually work alone, often as the only adult in the classroom. Teachers follow enduring practices as they organize classes, maintain order, ask questions, select texts, work on basic skills, and test students. Even as new curricula and technologies enter and exit the classroom, these usual practices persist. Although stakeholders and policymakers may legislate changes in teacher practices, once teachers close their classroom doors, they will use only what they believe will benefit their students. Like a hurricane, policy will often create the illusion of dramatic change, while deep below its surface life goes on uninterrupted. Apparently, not even the President of the United States can mandate what they believe matters in a classroom.

Thus, high performing systems need to relentlessly pursue improvement in classroom instruction, and to improve instruction, they need to find ways to fundamentally change what happens inside the classroom. At the level of individual teachers, three things need to happen (Barber & Mourshed, 2007):

Most reforms fail because they cannot do all of these things the same time. Some reforms emphasize accountability while others introduce performance-based incentives to provide encouragement but fail to provide teachers with knowledge of best practices and an awareness of their own deficiencies. Disembodied ideas cannot possibly result in changes in the classroom. Exposing teachers to best practices through workshops or written materials without showing them how this is applied in their own classrooms will also fail. Doctors and nurses learn in hospitals, clergy learns in churches, lawyers learn in court, consultants learn with clients, teachers learn in their own classrooms–places where ideas acquire sufficient prescience to become effective. How should any teacher do otherwise?

Teachers develop much of the instructional capability in their first years of training and practice. Evidence suggests that the support given to teachers during this period is rarely as effective as it should be. Frequently, this is because there is little connection between what teachers do during training and what they are expected to do once they arrive in a classroom. A surgeon fresh out of medical school is not expected to operate on a patient until they have completed three or four years of supervised internship. By contrast, teachers fresh out of school are all too often placed into classrooms to teach students with little or no supervision. How is this possible?

The next challenge is to make in-service training an effective tool to improve instruction. Some of the better school systems do this with on-the-job coaching. Expert teachers, trained to coach other teachers, enter classrooms to observe teachers, give feedback, model instruction and share in planning. It all begins with an understanding of what it takes to improve the quality of instruction in a single teacher. This understanding can develop into a system that will create similar conditions for all teachers. Although challenging, large scale transformations of instruction are thus achievable.

High performing school systems go further than this and put processes into place that ensure every student will benefit from this increased capacity. The system needs to ensure that every student–not just some students–has access to high quality instruction. The best systems ensure that schooling compensates for any disadvantages in the student’s home environment. They start by setting high expectations for what individual students should know and be able to do. They ensure that funding and resources are targeted at those students who need these most and not at those who need them least. Performance is closely monitored against these expectations and effective mechanisms are developed for intervention when these expectations go unmet. In the final analysis, there should be a low correlation between student achievement and the home background of individual students. Targeted interventions are essential to ensure that overall achievement levels will rise appreciably.

High performing systems also recognize that they cannot improve what they do not measure. Monitoring performance ensures that the system has the information it needs to be able to intervene when performance begins to falter. Monitoring classrooms allows them to hold schools accountable for their results, to identify and spread best practices, to pinpoint areas of weakness and provide them with targeted assistance. A certain proportion of targeted intervention budgets should be allocated to study the practices in the best classrooms and schools to ensure that the lessons gained from their experience are transferred to other classrooms and schools. Assessment data should be used to identify the best teachers and classrooms and then use these examples to develop new approaches to instruction and further reform.

In general, the intensity of this monitoring is inversely proportional to overall performance, both within and between systems. Top performing classrooms and schools may be exempt from monitoring on a few occasions and conduct only annual assessments. Classrooms and schools that perform poorly should be subject to more intense scrutiny, followed by appropriate targeted assistance.

Generally speaking, the responsibility for monitoring performance should be separated from the responsibility for improving performance. It is unreasonable to expect the same people who are responsible for improving education should also be the judge of whether or not improvement has been satisfactory. Typically as the school system improves, the task of monitoring migrates from external agencies to the schools themselves.

Unlike other professionals who work in teams, teachers usually work alone in circumstances that deprive them of an opportunity to learn from one another. Several school systems have developed strategies to change this by creating schools where teacher often observe one another’s practices. This produces an environment that stimulates the sharing of information about what works best, encourages teachers to give one another regular feedback, and helps shape common aspirations and motivations for improving the quality of instruction.

School leadership ranks second to classroom instruction in its impact on learning (Barber & Mourshed, 2007). Without an effective principal, a school is unlikely to have a culture of high expectations or strive for continuous improvement. High performing school systems leverage their knowledge of effective school leadership to develop their principals into drivers of improvements in instruction. School systems need to structure roles, expectations and incentives to ensure that their principals focus on instructional leadership rather than school administration. Most are expected to be excellent instructors who spend most of their time coaching teachers, maximizing their capacity to effect real improvement in student achievement. Being a teacher is about helping students learn; being a principal is about helping teachers learn.

Every principal spends time observing in-service teachers, and most teacher preparation programs provide pre-service instructors with feedback on their experiences in classrooms. However, the vast majority of these observations rely on informal procedures that have neither been standardized nor validated. Ideal teacher practices are based on broad educational theory or personal preferences. Few pay any attention to the classroom discussions and learning activities that constitute the intellectual core of teaching. Few evaluate the accuracy, coherence and relevance of the content that is actually presented or the clarity with which it is taught. These casual approaches to teacher evaluation are of significant concern because of the lack of standardized criteria for determining effective versus ineffective practices and the lack of standards for providing substantive feedback.

How can we possibly evaluate teachers without focusing on the relevance of activities related to student learning? In the absence of reliable and validated observational tools, the ultimate value of any such observations and feedback is likely to be limited and inconsistent. Teachers are likely to receive widely varying types of feedback and support during both pre-service training and in-service experience. Lacking support for a consistent teaching practice, relatively few teachers can be expected to make systematic progress toward effective teaching.

By using observational tools that have been standardized and validated against student outcomes, educators, mentors and administrators will make consistent comparisons when pointing out the strengths and weaknesses across classrooms, and they will know that the behaviors they are observing are directly related to student growth and development. Use of standardized instruments in no way compromises personalized feedback to individual teachers. Indeed, it provides highly individualized feedback in areas that have been defined consistently across contexts.

In order to enhance student experience and contribute to their academic and social growth, we not only need to measure but to improve interactions between teachers and students. There are at least three conditions that that need to be present for school improvement to occur. Teachers and teacher administrators need to have a common vision and shared goals about what factors contribute to making a successful classroom successful. There needs to be a reliable and valid mechanism in place to assess the relative standing of classrooms in relation to these goals. Teachers and school personnel need access to professional development experiences that enable progress toward these goals. A well-defined classroom observation system should support all three of these conditions.

Evidently, there are as many definitions of effective practice as there are teachers. Without a shared vocabulary of effective practice, communication between teachers and administrators is likely to be unproductive. Education leaders need to take an active role in defining, assessing and supporting those teacher practices and classroom interaction that favorably impact desirable outcomes. A reliable observation protocol provides these actors with a shared definition of effective practice and a common procedure for elucidating and recording these practices in the classroom. A shared definition of quality practice leads teachers and administrators to focus on classroom interactions that effectively make a difference.

Including direct assessments of teaching practice, in addition to value-added measures of student growth on standardized test scores, provides data that can directly inform targeted programs of professional development. Classroom observations provide a framework for focused and constructive feedback that helps teachers achieve higher levels of desired behavior in their interactions with students in the classroom. The key ingredient of any classroom is the nature and quality of interaction between adults and students. Although other classroom resources, such as curricula, teacher planning and parent involvement are important, the students’ daily experiences in the classrooms interacting with their teachers and other peers are what have the greatest impact on how much students will learn.

Teacher resources–teacher education, professional development, curricular resources, and evaluation/feedback–are linked to desired outcomes–teacher job satisfaction and retention; student social and academic development–through teacher interactions in the classroom. Inputs improve teacher and student outputs by way of social and instructional interactions implemented by teachers in the classroom. There are three broad domains of teaching practice that are linked to positive outcomes: socio-emotional support, organizational managerial support, and instructional support. Each of these domains helps to fully understand the impact of classroom experiences on student performance. When teachers use these practices, students can and do learn more.

Socio-emotional support includes classroom climate, teacher sensitivity and respect for individual student perspectives. In classrooms with a positive climate, teachers and students have positive relationships with each other and clearly enjoy being together and spending time together in the classroom. Teachers are sensitive when they consistently respond to students and are effective when addressing students’ questions, needs and concerns. Sensitivity includes having an awareness of individual students’ academic and emotional needs. Teachers can anticipate areas of difficulty and provide appropriate levels of support for all students in the classroom. Teachers who value student perspectives show that student ideas and opinions are worthy of consideration and encourage meaningful interactions and peer activities where students are allowed to assume leadership roles and take decisions.

Perhaps no other area receives so much attention as classroom management and organization. Management of student attention and behavior is an area of considerable concern to both new and experienced teachers. Well-organized and well-managed classrooms facilitate the development of the students’ self-regulatory skills. Students learn how to regulate their own attention and behavior in order to get the most out of instruction. Students are most likely to behave appropriately in the classroom when rules and expectations are communicated clearly and consistently. Behavior management works best when focused on proactive intervention and positive and efficient redirection of minor misbehavior. High-quality behavior management provides students with specific expectations regarding desirable behavior and continuous reinforcement to meet these expectations.

Productive classrooms run “like well-oiled machines,” where everyone knows what is expected and how to go about their activities. Little or no instructional time is lost due to unclear directions for students, lack of materials, time spent waiting for something to happen, or exaggerated attention to managerial details, like when instructions take longer to administer than the task itself. In effective classrooms, teachers engage students by providing instruction that features visual, oral and movement components. Teachers look for opportunities to engage students in active participation. Student learning is facilitated through group lessons, seat work, and one-on-one time. Well-timed questions and comments expand student involvement. Teachers provide visual schema and summaries to help students focus on the main lesson points and activities.

Instruction can be divided into general and content-specific support. General instruction is relevant and observable across content areas. Content-specific support teaches students particular skills and knowledge such as in reading, math or science. Effective teachers help students comprehend the overarching framework of key concepts and ideas in any given area. Facts, concepts and principles are integrated into a framework, rather than having facts and definitions analyzed in isolation. Effective instructions engage students in higher order thinking skills, such as reasoning, integration, experimentation and a conscious awareness of how one’s own thinking works (executive functioning and meta-cognition). When teachers effectively foster reasoning skills, the cognitive demands of these activities rest primarily with the students, as opposed to situations where the teacher presents information and draws all of the conclusions. Students are expected to independently solve or reason through novel and open-ended tasks.

Effective teaching of specific procedures and skills requires clear exposition of the steps that need to be carried out in sequence. These, in turn, are clearly anchored in the knowledge and skills which students have already acquired. Teachers provide multiple, varied, relevant and age-appropriate examples to exemplify the use of a specific procedure or skill. Alternate problem-solving approaches to the same problem are also discussed. Teachers offer opportunities for supervised practice prior to independent practice implementing the new skills and procedures.

Students learn the most when they are given consistent feedback about their performance. Feedback should focus on the process of learning rather than simply on getting the right answer. High quality feedback provides students with specific information about their work and helps them reach a deeper understanding of concepts than they can get on their own. Teachers delivering high-quality feedback don’t simply stop with “thanks” or “good job.” They engage in ongoing, back and forth exchanges on a regular basis. Effective teachers intentionally provide support for developing increasingly complex verbal communication skills. Teachers facilitate language development when they encourage, respond to, and expand on student talk. High-quality instructional dialogues purposefully engage students in meaningful conversations with teachers and peers. High-quality language-modeling activities involve repeating students’ words in more complex forms and asking follow-up questions to probe student understanding. Students are explicitly introduced to new vocabulary and consistently exposed to a variety of new language uses and forms.

Observational systems have been designed to capture the frequencies of certain behaviors or more holistic patterns of behavior. Time-sampling measures count the number of times specific teaching behaviors occur, including the number of times teacher ask questions during instruction or the number of negative comments made by students. Holistic scoring requires observers to watch for patterns of behavior to make summative judgments about the presence or absence of these behaviors. Observers are asked to assess the degree to which classroom instruction matches a description of evidence-based practices, whether instructional conversations stimulate the students’ higher-order thinking skills or whether classroom interactions contain a high degree of negativity, both between teachers and students and among peers.

The advantage of holistic rankings is that they assess higher-order organizations of behavior that should be more meaningful than considering discrete actions in isolation. After all, positive actions by teachers, like smiling, may have different intentions and may be interpreted differently by students, depending on the context. Teachers may be cheerful, but their emotions may appear to be unrelated to those of their students. In another classroom, a teacher may be more subdued but more in sync with the emotional responses of her students. What is needed is not so much an accounting of teacher actions but rather higher level inferences about the teacher’s ultimate purposes and effects. Counting the number of times a teacher smiles requires fewer inferences than making holistic judgments about degree to which she fosters a positive classroom climate.

This same point emphasizes the need for standardized procedures that minimize rater effects and coding errors. Basically, informants are asked to quantify their opinions, although typically they do not have numerical access to their opinions. Attempts to respond to a list of question statements with numbers from 1 to 5 are fraught with difficulties such as response styles, amenability bias, halo effects, stereotyped responding, and thresholds that shift while working through the questionnaire. Distortions in the responses strew artifacts through subsequent analyses, interpretations and discussions.

The solution to this problem is to replace rating procedures with ranking ones: instead of interval-level data, to work at the ordinal level. The essential feature of ranking is that informants make explicit comparisons: "This is greater than that." This encourages informants to seriously consider their responses, while avoiding response strategies that lessen cognitive effort. Several studies show that data quality is thereby improved.

Although it is necessary for raters to make higher-level inferences, holistic scoring needs to be undertaken in circumstances that ensure the highest levels of objectivity and the smallest possible margin for error. The essential component of useful observation is that every observer selects qualitative judgments from the protocol under identical circumstances, maximizing agreement between observers so that system-wide consistency is ensured. In practice, this is achieved by providing the rater with stark choices, presented side-by-side in close proximity so that individual observers can easily make the right selections.

The appropriate procedure uses card sort rather than Likert items. Card sort offers real choices, such as whether you want “steak and eggs” or “fruit and salad.” These are plausible comparisons and real choices, where one can reasonably infer for example that A > B or B > A. By contrast, Likertitems ask a person to rate each such statement along a five-point scale. Conceivably, both “steak and eggs” and “fruit and salad” could both receive full credit scores, in which case the exercise has been uninformative. Metaphorically speaking, if everything is priority then nothing is priority. Card sort prioritizes choice, where each subject necessarily is doing one thing or another, but never both.


Figure 1. Fully functional trilemma card sort application based on Charlotte Danielson items and taxa.

The purpose of observations is not only to gather information on classroom quality but also to use this information to help teachers improve classroom practice and promote student learning. The scoring protocol is designed to assist in translating observational data into professional development planning. The scoring protocol provides guidelines for reviewing results with teachers within a consistent teaching framework. Score profiles are readily translated into discursive handouts that suggest competence-building professional development.

Table 1. Selected Charlotte Danielson items and corresponding taxa

Item Statement Taxa Excerpt
I253 Is optimistic and self-assured. T528 Expert teachers make highly efficient use of time in their management of non-instructional tasks. An expert teacher might take attendance while the students are engaged in another activity. Other non-instructional activities are accomplished in a similarly efficient manner. Procedures are carried out with little expenditure of energy. In fact, procedures for non-instructional duties have evolved such that students themselves play an important role in carrying them out.
I704 Standards of conduct appear to have been developed with student participation. T531 Evidence for how teachers manage classroom procedures is obtained through classroom observation. If asked, students would be able to describe the classroom procedures. In addition, teachers can explain their procedures, how they have been developed, and how students were involved in their creation and maintenance.
I234 Identifies causes of antisocial, counter-productive or nonproductive behavior. T534 Experienced teachers recognize that much of be student misbehavior is actually a result of other causes, such as these: Students who are unprepared attempt to camouflage their situation by "acting out." Students who don't find a task engaging let their attention wander to more interesting matters. Students pass notes or discuss out-of-class events; a student converts his pencil into an imaginary car and runs it around his desk, with appropriate sound effects.
I310 Social skills are poorly developed or show evidence of low self-esteem. T535 Students who have poorly developed social skills or low self-esteem find opportunities to initiate oral and physical confrontations with other students, disrupting a class.
Note: There are currently 202 Danielson items and 105 Danielson taxa.

The design of the classroom assessment instrument begins with a framework for teaching practices and student-teacher interactions which is itself substantiated by the literature on best practices. Item statements are written to represent this framework and any substantive literature on which it is based. These items are then sorted by human subjects to determine the structure of relationships between every pair of items. A multidimensional map or model of these relationships is generated, where proximity represents inter-item similarity and distance represents dissimilarity.


Figure 2. 57-Item Danielson MDS 3D solution with 10 clusters.

Item sets of three statements are selected on opposite sides of this model so that classroom observers will be presented three sharply contrasting statements. This ensures that the informant is presented plausible comparisons representing real choices. Following a classroom observation period, the informant is asked to sort the item statements. The informant’s task is then to identify the item statement in each set of three that “applies most” and the item statement that “applies least.” In practice, this is done with drag-and-drop so that the items will exchange position on a computer screen, tablet or cell phone. By identifying the item statement that is most descriptive of the classroom target, and the item that’s least descriptive, this completes the ordering of the full set of three items, from which one can reasonably infer for example C >A, A > B and C > B.

Figure 57-Item Danielson MDS 3D solution with 10 clusters, showing one trilemma set of three items.

Figure 3. 57-Item Danielson MDS 3D solution with 10 clusters, showing one trilemma set of three items.

The informant proceeds in this fashion to order anywhere from 30 to 40 sets of three, representing 90 to 120 comparisons across all of the domains of the framework. These choices are then scored to produce a multifaceted profile set of scale scores for each teacher-classroom visited. While most people find it difficult to work with more than 5-9 scale scores, this procedure is capable of handling a framework with, say, 20 dimensions, and produce a scale score for each of these dimensions. Indeed, this procedure allows us to provide a scale score for each and every item, based on the relations between all of the item rankings or even scale scores to represent specific paragraphs in the literature on best practices, if so desired.


Figure 4. Fully functional trilemma card sort application based on Charlotte Danielson items and taxa, showing classroom profile set of scores and latent class probabilities.

Moreover, this technology is familiar to everyone, because it is the same technology used when search terms are entered into Google Search before Google returns a selection of documents with the highest page rank indices. In this case, the card item statements are analogous to Google search terms, which can then be used to identify the framework entries and literature sources with which they are most closely associated. References to these sources are based on the scale scores for any component of the model and the correlation between that component and any related framework elements and best practices supporting documents.

The linkages between the items, the framework and the literature make this measurement system ideal for targeted professional assistance. A teacher will be able to click on any one of the scale scores included in their personal profile and this will bring up their item scores and percentile rankings on all of the items in that domain. By clicking on any one of these item scores, this will bring up the associated framework element. Clicking on this element in turn will reference the supporting best practices literature. These resources should be used with targeted assistance to improve teaching practice and student-teacher interactions on site in her own classroom.

Additionally, this goes a long way toward validating these classroom observation measures since these are directly linked to longer discursive arguments found in the framework and best practices materials. To the extent that teachers are able to recognize teaching deficiencies and enhance student learning by correcting them in their own classrooms, this will achieve “buy-in” with teachers, supervisors, principal and other education stakeholders. At that point, these classroom assessment procedures will have achieved a proven record of demonstrated performance capabilities suitable for widespread adoption in school improvement and accountability initiatives.


Barber, M. & Mourshed, M. (2007). How the World’s Best Performing School Systems Come Out on Top. McKinsey & Co.

Sanders, W. L., & Rivers, J. C. (1996). Cumulative and Residual Effects of Teachers on Future Students’ Academic Achievement. Knoxville, TN: University of Tennessee Value-Added Research and Assessment Center.

Bembry, K.; Jordan, H.; Gomez, E.; Anderson, M.; & Mendro, R. (1998). Policy implications of long-­term teacher effects on student achievement. Paper presented at the annual meeting of the American Educational Research Association. San Diego, CA.