University of Illinois, Champaign-Urbana
[Robert Weissberg is a professor of political science at the University of Illinois, Champaign-Urbana, specializing in public opinion, electoral behavior, methodology, and American government. His writings include several textbooks, numerous articles in the major journlas in his field, and political humor. He is currently at work on a critical analysis of the political science profession. This article is reprinted with permission from Perspectives on Political Science 22.1, Winter 1993.]
The standardized paper-and-pencil teacher/course-evaluation form has become a ritual at virtually every major university in the United States. One recent survey (Ory 1991) reports that over 70 percent of all institutions collect student-rating data, and among large research institutions, the figure is lOO percent. Commercially available rating systems have also become big business. Everyone from internationally known, distinguished professors to lowly teaching assistants are being judged, for example, on the fairness of their grading procedures and their ability to explain the material. Job applicants often enclose statistical summaries of their ratings as evidence of teaching ability. Compared to the often raucous debates over, say, a culturally diverse curriculum or acceptance of military research contract work, the imposition of this alleged quality control has gone almost unnoticed. Even those denied merit raises or promotions because of low scores are fairly mute about the process by which the judgment was reached. After all, who wants to be against attempts to encourage good teaching and university accountability?
Much has been written about these ubiquitous moments of judgment, but almost all of this literature emphasizes the technical aspects of measurement. Educational psychologists have had a feast with the never-ending volumes of undergraduate-supplied data. Bibliographies run for pages and have such entries as "The Generalizability of Student Ratings of Instructors: Item Specificity and Sections Affects" (Crooks 1981) and "Seniority and Experience of College Teachers as Related to Evaluations They Receive from Students" (Feldman 1983). A 1988 review of the field estimates the number of studies of the subject to exceed 1,300 (Cashin 1988). This literature on the applications of various techniques and their technical characters is valuable, but it scarcely touches how such quantitative assessments of teaching shape political relationships within the university. I shall argue that this new, scientific approach to "good teaching" brings with it a significant change in university power as well as a new understanding of teaching. That "please take five minutes to complete the form" exercise is not the status quo made better via computer scoring of pencil scratches. In a nutshell, the computerized standard evaluating instrument is a powerful tool for managing the faculty. It is also a process that would easily permit political pressure disguised as "rewarding superior teaching."
There is no official record of when the first standardized teacher-evaluation form emerged from the ooze and crawled on to dry land, but it must have been fairly recently (some data on this point is presented in Doyle , chapter 1). American higher education survived, even prospered, for over a century without many computerized assessments. And, as every teacher knew, this "pre-modern" era had lots of mechanisms to insure minimal pedagogical competency. Indeed, it would be very difficult to argue that the modern evaluative mechanism is a superior, more powerful product of a long evolutionary process. It may even be a distinctly weaker form of quality control. Let us not equate mountains of data with automatic improvement.
Enrollment has always been one clue to instructional quality. Every department is quite aware of professors whose very name on a course announcement is the kiss of death. Other professors are well known for their ability to reduce large lectures to small seminars in three weeks or less. Class size and attrition do not, of course, by themselves automatically reveal quality of teaching. Nevertheless, when considered with other obtainable information such as reading lists, paper assignments, and grading, enrollment is one indication of educational performance. A teacher who hardly burdened his students with assignments and low grades and still drew poorly could scarcely claim to be reaching only the most serious, truly gifted students. Conversely, there are those who fill seats even while giving a killer course.
The precomputer-form era also relied heavily on written materials such as solicited and unsolicited student letters, alumni surveys, and student-run course guides. Though relatively small in number by today's standards, these comments were, and justifiably so, taken seriously. It is hard to imagine a university administrator dismissing three or four unsolicited, detailed, highly negative student letters with the comment "too small a sample." Instructor assessments in student-run course guides in particular were carefully scrutinized despite the well-understood limitations of such publications. Fraternities and sororities also would pass on traditional wisdom on pedagogical virtue. As was the case with course enrollment, nobody would claim that such information gave the whole story on competence; it provided bits and pieces of data to be used with lots of other information.
A third general and widely available source of information was course materials. A close look at a syllabus would reveal choices on readings, paper assignments, and an overall tone of the course. How much class time was spent on student reports? Did teaching assistants conduct too many classes? Examination questions might also reflect teaching quality: endless true/false questions on trivial details do not proclaim a deep commitment to serious thought. At a minimum, such materials show what students might have learned if great teaching helped them to learn it all. Judgments on course quality are not difficult and are routinely performed when considering transfer credits from other universities. You do not have to be a genius to spot a Mickey Mouse education.
Finally, one can and should consider the intellectual stature of a professor apart from anything occurring in the classroom. This can be delicate, but it is appropriate. It is difficult to imagine how a colleague hopelessly confused about the simplest ideas can effectively teach. Can a fool intoxicated with high-sounding jargon be expected to be a different person when performing before students? Conversations over lunch can reveal the grasp of a field, familiarity with current research, and intellectual enthusiasm. To be blunt, a genius may or may not be a decent teacher; it is highly unlikely that a third-rate intellect will ever be a decent teacher. One might argue that less damage is done by the incoherent genius than the well-trained "effective" fool (better not to learn than to learn foolishness).
These, then, were the major sources of information that could be employed to assess effective teaching in the pre-modern period. In reviewing these sources, several conclusions are clear. First, no one piece of information could prevail in a final judgment. Reaching a conclusion was like putting together a complex jigsaw puzzle: enrollment figures had to be judged together with course material, whereas unsolicited testimonials or criticisms made no sense without considering the character of the instructor. How is a heavy reading load of second-rate books to be assessed? What if the entire Field itself was largely rubbish? Under such conditions, any conclusion was a bit messy and open to differences of opinion. Committees could easily disagree sharply on whether Professor X was a good, mediocre, or poor teacher. For those who like things cut and dried, this evaluating process was Troublesome not because the information itself was faulty; rather, the rules for collecting and integration of all the information were virtually nonexistent. The assessment process reflected different conceptions of good teaching.
The inconclusive nature of judgments reached under typical conditions also meant that domination was difficult if not impossible. No single person or interest could legitimately and authoritatively render a verdict on who was a good or bad Teacher. Politically motivated students might demonstrate in favor of a fired popular teacher, but administrators could counter by noting the lack of rigor in reading assignments, easy grading scales, and letters from a few unsympathetic students. Or, a teacher widely viewed as terrible by most students may nevertheless be judged by his or her colleagues as "brilliant but difficult to follow" on the basis of an outstanding reading list and a first-rate intelligence. In many instances, no easy final conclusion was possible given conflicting testimony and differences of interpretation. Everyone had an opinion. This multiplicity of views would today be called imprecise or unsystematic.
The emergence of the standardized evaluation form has replaced pluralism with centralization. Other and more traditional mechanisms of judgment--enrollment, course material, letters, and intellectual stature--are now of secondary importance and may have disappeared altogether. This change is easy to understand. The standard measurement device reeks of "science" and, in most contemporary universities, anything to do with science prevails over subjective, impressionistic, near-random sources of information. Moreover, sending out and tabulating thousands of computer-scored questionnaires is an administrative snap compared with old-style evaluation. Science is not only powerful, but it also saves a lot of time. This is no small matter to the hard-pressed departmental chairperson or dean. Finally, the new method has the full force of administrative support where it is employed. In effect, all faculty must distribute these forms, and it is this information that forms the basis for salary and promotion decisions. It is inconceivable, for example, that a junior faculty member up for tenure would request an "alternative method" to be used to assess his or her teaching.
The pre-modern system of assessing "good teaching" was disorganized with a multitude of competing criteria and claims. By contrast, the new and improved product is firmly under central administrative control, though faculty have a degree of input. This new power on campus typically is named something like "Office of Instructional Evaluation" or "Teaching Resource Center," but for our purpose we shall refer to it as the Ministry of Teaching (MOT). The MOT authorities are rarely teachers themselves, except perhaps to give courses on how to measure or improve teaching. For the most part, people in fields akin to educational psychology are responsible for assessing the work of philosophers, engineers, psychologists, and chemists.
Given the enormous diversity of disciplines, educational objectives, and class environments, it is predictable that good teaching will be measured crudely. Devising instruments that are truly appropriate to, say, a seminar on Spinoza and an introductory calculus class is well beyond the MOT's capacity. Equally important, because departmental and university rewards are to be made on the basis of these data, easy statistical comparison is required. Global, general questions (e.g., rate this course, rate the instructor with five- or seven-point scales) are thus the only reasonable alternative. There may be discussions of classroom visits, videotaping, and various supplemental techniques, but these almost always remain suggestions for some future new and improved product. Even if such techniques were used, their end product would be a small set of numbers. The triumph of simple-minded consumerism is built into the system of standardized, university-wide assessment.
What is measured is what can be measured within this technical framework. The most critical missing piece of information is what is learned from the course. It is not that the form writers at the MOT do not care about learning; rather, the measurement of learning is not only technically difficult, but it would generate endless controversy (for a sampling of the bewildering complexity of how "learning" might be interpreted see Bloom, et al. ). The imperfect but far from irrelevant "old-fashioned" methods such as examining reading lists, reviewing paper assignments, and otherwise scrutinizing course substance are victims of the new methodology. Moreover, this attitude of "what cannot be measured precisely is unworthy of serious consideration" is perfectly consistent with the dominant approach among the social sciences that shape evaluation methodology.
Needless to say, MOT officialdom would argue that they are every bit as concerned about learning as the next guy, and teacher ratings do reflect student learning. There are two rejoinders to this claim. First, findings on this subject are somewhat mixed (see Centra , 37-38). There are studies with weak or even negative relationships. Also, the strength of the association varies by type of instructor (rank, experience, and employment status). Second, the clearest evidence for this good rating equals learning is based on differences in examination scores across different sections of the same course. That is. students of highly rated instructors in multi section courses on average did better than students of poorly rated instructors. The validity of assessments of teaching, then, comes down to relative differences in exam scores under some--but not all--conditions. This is important and certainly cannot be dismissed, but such evidence hardly warrants the conclusion that highly rated instructors are good teachers in the broader meaning of "good teaching." At best, there will be many exceptions, and the research design does not measure what was learned from a given teacher (nor does the research consider the character of the examinations used to measure learning)./1/
This shift in the meaning of "good teaching" is profound. The phrase "Professor X is a good teacher" conjures up an image of someone conveying important knowledge to students, motivating them to make extra efforts, and otherwise having an intellectual impact. Compare this to the phrase "Professor X gets good teacher ratings." Such good ratings may be rooted in any number of pedagogical virtues: everything from being a good teacher in the traditional sense to being a competent technician who explicates a textbook in a lively, animated style. In effect, the MOT and its methodology engages in some bait and switch. The old-fashioned term "good teaching" is kept, but it is given a new meaning. Intellectual development becomes student satisfaction about attempts at intellectual development.
A study conducted by Naftulin, Wore, and Donnelly (1973) illustrates the possible contradiction between good teaching and good ratings. A paid professional actor calling himself Dr. Fox gave a graduate-level lecture that was purposely incoherent and without content. But it was done in a most enthusiastic and engaging manner. An experienced group of educators nevertheless gave Dr. Fox high marks for teaching. One can only imagine the ratings if Dr. Fox was lecturing undergraduates.
More is involved than putting new meaning in an old concept. This transformation determines who is and is not a "good teacher." In the days before MOT ascendancy, a good teacher was someone who positively influenced some absolute but unknown number of students. It might be said, for example, that Morris Cohen (the legendary City College of New York philosophy professor) or Leo Strauss (the University of Chicago teacher of political philosophy) profoundly affected the lives of hundreds, maybe thousands of students. A less famous good teacher might only reach a few dozen students a year. For all intents and purposes, those students who took a course from Morris Cohen or Leo Strauss and thought the experience a waste of time were irrelevant in determining a teacher's reputation. Indeed, a great teacher's reputation might rest on the subsequent accomplishments of one or two students a year over a forty-year teaching career. Only successes counted.
The modern, standardized mass-distributed evaluation form replaces absolute numbers with proportions. Being a good teacher now means having a certain proportion of highly positive ratings. In other words, having awakened the intellectual passions of, say, 50 students counts for little if one antagonized 150 students in the same class. But, if these 150 unappreciative souls were to drop out or not fill out the evaluation questionnaire, one would become a good teacher. In the language of test taking, teaching is now scored right minus wrong. And, given the typical pattern of rating scores, a few highly negative ratings can devastate a large number of positive judgments (three high ratings plus one very low score says "average teacher").
The use of proportions rather than absolute numbers is a change that does not affect everyone equally. A teacher whose demands and techniques divide students into lovers and haters can never rise above mediocre ratings. Woe to that person who appeals only to the brightest and most talented. One can only imagine how the legendary great teachers of the past would be rated today. A teacher who cannot (or will not) reach out to each and every student no matter how inept or uninterested will suffer when rewards based on teaching are dispensed. Obviously, a teacher intent on higher ratings has the same incentive to eliminate potential malcontents as to impress potential fans. Good teaching, then, could very well become a damage-control activity.
Having reduced teaching to a set of statistical data, the MOT is now able to "scientifically" distinguish the good from the bad, the gifted from the atrocious. Almost always this is done on the basis of relative standing: a formula is created defining, say, the top 20 percent of the array as "excellent," the next 20 percent as "above average," and so on. Strictly speaking, a good teacher is only great in terms of some statistical formula. Such purely statistical treatment guarantees that every university and department will have a normal complement of great to well below-average instructors. This computational division of teaching excellence is true even if raw numbers are employed--a good teacher may be someone who has, say, 50 percent of his or ratings above a 3 on a 5-point scale ("Our students are stingy with their praise...").
The reduction of teaching quality to a data set managed by the MOT has major consequences. Most obviously, administrators via technical statistical decisions can shape the meaning of good teaching and who is rewarded and punished. For example, by defining "excellence" as being beyond the third standard deviation, the number of excellent teachers will be small. Or, if more great teaching is to be commended, the dividing line can be lowered. Similarly, the choice of comparison can be changed to yield a different picture of teaching. Teachers of large introductory classes (who often receive lower ratings) can be compared only with others teaching similar Courses rather than those teaching small upper-division elective courses. The result would probably be a sudden "improvement" in the quality of introductory instruction.
This control of how teaching data will be treated and interpreted is a political windfall for university administrators. Recall from my earlier analysis that in the pre-modern days determining quality teaching was an activity with many participants, none of whom could speak with absolute authority. Today, thanks to the MOT, administrators can marshall impressive scientific-looking data in defense of their decisions. Delegations of students outraged over the firing of a popular teacher can be cooled-out with scientific hard evidence. A brilliant researcher who cares nothing for teaching can be saved from charges of irresponsibility. The clamor for rewarding teaching in a research-driven environment has been addressed. ("Whereas Professor X may have done poorly in introductory courses, his ratings in graduate seminars were among the best in the department.")
To understand how this can be accomplished, one must first grasp how modern assessment differs from the pre-technology days. In the old days, the assessment gave disproportionate weight to more extreme opinion./2/ Students would be motivated to write letters only about very good or very bad teachers; ordinary classes would elicit about as much response and recollection as a meal at McDonald's. Likewise, nondescript course material would hardly get noticed. An administrator working with an evaluation of teaching was like a painter working with a largely black-and-white palette. Today, because all students participate in evaluations, the amount of low-intensity middling opinion is not only much greater, it probably dominates most course and teacher ratings. U-shaped distribution has been made into a more "normal" distribution via compulsory student participation.
The resourceful administrator now has something to work with. For example, when firing a popular teacher with a vocal following, the administrator could probably note that "a significant number" of students gave the instructor average ratings (remember, such average ratings rarely surfaced in the pre-MOT days). The administrator could honestly argue that these vocal protesters do not speak for the silent majority. The inclusion of the once nonexistent middling reactions can also be used to bolster the case of a poor teacher given tenure: "Whereas the average rating was poor, nevertheless there were a sizable number of students who rated this instructor as acceptable." An especially industrious administrator could review data from years of teaching and devise the most creative comparisons to make the best or worse cases for distributing rewards based on teaching. And, remember, such data interpretations are very likely to be the exclusive property of administrators--disgruntled students must rely on "soft," impressionistic materials to argue their case.
In sum, the new approach greatly facilitates the administrative management of a once difficult-to-control activity. No longer will deans lose sleep over the possibility that students will embarrass the university by proclaiming a distinguished getter of research grants to be a classroom catastrophe. Outsiders prone to speeches on quality teaching can be wowed with all the apparatus dedicated to collecting and interpreting teaching performance ("20,000 questionnaires a year are distributed...a staff of six monitors these data...all instructors are judged by them..."). In a word, from the perspective of the university administration, the teaching issue has been domesticated.
Our analysis thus far has stressed the advantages gained by the university. These benefits have existed for over a decade, though they are rarely noticed. Even less visible are the possibilities for the calculating instructor and political groups. The MOT may claim that its handiwork is some neutral, valid, and reliable mechanical device that is tamper proof. In reality, however, it can be used for personal and political gain. Such uses are not yet common, but it is only a matter of time. Indeed, twenty years from now the whole scientific assessment process may become a nightmare, an apparatus in unfriendly hands.
Let us start simply and consider a mythical Professor Prince who, despite limited aptitude, is determined to be deemed a good teacher. Being a good social scientist, Prince knows that one's ratings must surely be influenced by factors other than actual classroom performance. Some might be beyond easy control, but others are tempting possibilities. After a few days in the education library reading the likes of the Journal of Educational Psychology, Prince has uncovered all sorts of seemingly harmless and uncontroversial ways of enhancing his ratings. His goal is not to be considered a great teacher--this is impossible; rather, the objective is to transform ordinary ratings into above average or even good ones. He reasons that it is far more cost effective to improve his teaching scores than to attempt to improve his teaching. Based on research on teaching assessment variability, here is Prince's plan.
Stay in the room when ratings are completed. This is usually prohibited, but it improves scores and it is a detail not part of the official record.
Make sure that students know that their ratings are to be used in your promotions. Again, an invisible method to squeeze out a slightly higher rating.
Never distribute rating sheets during a final exam as this depresses scores. Give forms out when only the hard-core students are present.
Encourage students to believe that they will get a higher grade. Students are more charitable when they believe that they will be handsomely rewarded.
Beg off teaching required courses. These can be killers.
Periodically teach very small classes. Such courses are almost sure bets for very high ratings.
Teach upper-Ievel courses that encourage majors and/or those with an existing interest in the subject matter.
Be enthusiastic about the material. Remember the Dr. Fox study./3/
It is unlikely that the true nature of Professor Prince's strategy will be detected. The desire to teach small classes catering to junior and senior majors is hardly evidence of manipulating the system. We must also realize, of course, that this same knowledge can be used to undermine an academic career. Professor Prince eventually becomes department chairperson and now decides to do-in his rival Professor Louis. Prince pleads with Louis "to help out the department" and teach the 1,000-student "American government for non-majors" course. Louis agrees but does so without much enthusiasm. Meanwhile, traditionally weaker instructors are given small, upper-division classes limited to majors. At salary time, Prince regretfully tells Louis that his teaching performance was among the department"s worst. Louis does not understand; he had always been among the department's better teachers.
As an understanding of what shapes good and bad ratings gradually becomes common university knowledge, demands will occur for "adjustments." Those who regularly teach courses biased toward low ratings might insist that their averages should be adjusted upward. Similar demands might emanate from novice instructors, those who have physical limitations, or those teaching a course for the first time. As each legitimate concern is translated into a statistical correction factor, overall evaluation policy becomes as complicated as the U.S. farm price-support policy. The MOT will hire people to churn out all types of weighted and corrected data on who really teaches well. Entire departments may join the battle to maintain their standing as a good teaching department by insisting that the inflated ratings of other departments be normalized. Needless to say, as the data on good teaching become numerically complex and further removed from intuitive understandings of teaching, the opportunity for manipulation grows accordingly. At some point, the data become meaningless, akin to the official currency exchange rates in former communist countries.
Real world politics can easily intrude into this allegedly objective evaluative process. Filling out a performance sheet is an anonymous, unchallenged act. A student can give any rating whatsoever, and nobody can appeal, ask for a justification, demand evidence, or otherwise escape the whim. It is inevitable that campus political groups will come to see teaching evaluations as away to exert pressure. Correct-thinking instructors can be rewarded for the non-class activities with superb teaching evaluations. Conversely, the politically incorrect can be punished regardless of classroom performance. The use of punishment is especially likely because a few highly negative judgments can easily lower an instructors relative standing. A well-organized group could pack a course thus insuring terrible ratings. The victim of such an attack, as well as his or her colleagues and the administration, might not even suspect what has occurred. Now the assault is not only invisible, but it is draped in scientific objectivity!
Even more serious would be the use of the evaluative mechanism to impose political views on teachers and students. Already, on at least one campus, there are mandatory questions regarding instructor sensitivity to diversity, his or her attention to the contributions of minorities, and the like. Such efforts are clearly attempts to impose views in the classroom. A teacher who, for example, gave a traditional and straightforward explication of the U.S. Constitution might suffer if he or she did not spend time talking about the suppression of African Americans, or even the possibility that Alexander Hamilton was gay. Even those teaching courses on topics such as statistics or accounting might not escape such questions. After all, examples can convey messages regarding domination. An instructor who objected to such political monitoring would be no different than the Luddite who objected to standardized teaching evaluations. The withholding of pay increases and promotions would be almost automatic.
A more advanced form of political control could be to separate evaluations by student group characteristics. Thus, one would be evaluated not by students overall, but by separate groupings of students: African American, women, people of color, or whatever. The impact of this could be devastating. Imagine, for the moment, that an instructor consistently received low scores from black students. In today's environment where the recruitment and retention of black students is a high priority, such failure is more than just plain old bad teaching. One is now an obstacle to social justice. Charges of institutional racism would likely be raised. No doubt efforts would be made to counsel such an instructor; perhaps a special section of the MOT would offer workshops on communicating with students of diverse backgrounds.
Taken together, the introduction of political criteria in the evaluation process would probably further politically homogenize the social sciences. An instructor who politically antagonized or even challenged students would not be deemed as an ineffective teacher. Routine discussions of political and social topics would be comparable to navigating a mine field. Some instructors, no doubt, would find it impossible "to clean up their act." In extreme; cases they would be denied tenure due to teaching "problems"; others might seek safety in administration or in small, more technical classes. Many might foresee the problems and avoid academic life altogether. Courses containing potentially controversial material would thus become the property of those catering to the prevailing political orthodoxy. In short, the MOT becomes an administrative arm of the campus political police.
The emergence of a Consumer Report-like evaluation system must be viewed as part of a large shift in campus life. Not unlike the national government, college administrators now speak of managing, not solving problems. This is a critical distinction. In medicine, diseases such as the flu can be cured; diseases such as high blood pressure or diabetes can only be managed. Similarly, whereas a shortage of classroom space may be solved by constructing new facilities, problems such as recruiting a properly diverse faculty, keeping tuition affordable, reducing campus crime, and improving teaching fall into the "manage" category. That is, action must be taken to get vocal critics off one's back regardless of whether such action truly cures the problem.
This shift to "problem management" is easily understood and predictable. Basically, universities are being asked to solve problems that either are inherently insolvable or not solvable with available resources and powers. Administrators cannot command even the most skilled and motivated teachers to transform unintelligent, disinterested students into thoughtful scholars. No teacher in today's atmosphere can force students to read or even attend class. Outside of areas where a degree has a very high cash value (e.g., medicine, business, law), being a tough, demanding teacher risks declining enrollment. It might be possible for, say, a top-ranked accounting or engineering department to insist on high standards, but this would be risky business for fields such as political science, sociology, anthropology, and other departments lacking significant vocational value. Demanding reading assignments and holding students accountable for what they read in today's no-growth environment is to invite deans to reallocate resources to departments with growing enrollments.
At some deeper level, university administrators must realize that they cannot significantly (or even marginally) improve the overall level of teaching. Serious efforts to do so will be resisted by both students and faculty. How many professors would welcome close scrutiny of their reading lists or examinations? Imagine the response if a dean were to support compulsory teaching workshops for those in the bottom quarter of students' ratings. Anybody for random videotaping of all teachers? How many students would relish the idea of having to rewrite poorly drafted papers? How many departments would fire the productive grant-getter because of teaching deficiencies? In effect, virtually everybody wants good teaching but they want it cheaply. Universities find themselves in a position comparable to U.S. automobile manufacturers: improving quality would cost billions, but an advertising campaign stressing quality costs only a few million. Public relations to the rescue.
It would be easy to criticize universities for a cosmetic approach. We are all familiar with blue-ribbon reports calling for "serious efforts" to do this or that, a "deeper commitment" to some course, and on and on. Such criticisms are even embraced by administrators who, like defendants in 1930s political show trials, readily admit their shortcomings. Such criticisms, however, miss the point: the entire MOT system is popular because it seems to satisfy the most people easily and cheaply. Students are getting input, faculty are not hounded for indifferent effort, administrators are in charge of an easily managed system, and critical outsiders can be impressed by a commitment to quality teaching. The existing system of assessing good teaching is a good system not because it improves teaching, but because it helps accommodate diverse interests at the lowest possible cost. It is not mere public relations; it is great public relations: a textbook case of successful problem management.
It is also a fine approach for fending-off lawsuits from irate students and parents who might feel cheated by inadequate instruction. Suits demanding tuition refunds and damages have already occurred. For example, a student at the University of Bridgeport argued in court (Ianello v. The University of Bridgeport) that a course contained little substance, did not correspond to its catalog description, and did not permit adequate evaluation of student work. The university won that case, but a Tennessee court upheld a suit claiming that a Vanderbilt doctoral program was inadequate. Systematic, extensive teacher evaluations can thus serve the same institution protection function as detailed, affirmative action plans. Both provide written records legal evidence of good intentions.
This litigation-deterrent value is likewise relevant when universities choose to fire teachers from "protected" groups such as blacks or women. In the case of Northern Illinois University v. Fair Employment Practices Commission, a woman professor sued the university on the grounds that salary increases and promotion were denied due to sexual discrimination. Because of low teaching ratings, the university's defense was greatly strengthened. As more and more personnel cases go to court, the MaT may prove an invaluable player because it is one of the few sources of legitimate hard data in termination proceedings. (By comparison, judgments about research quality, being inherently "subjective," will always be open to easy charges of conscious and unconscious bias.)
In all, it is no wonder that the use of standardized teaching assessment has gone largely unchallenged. At best, it solves lots of problems quietly, quickly, and at minimal cost. At worst, it unfairly punishes a few good teachers mistakenly labeled poor at salary-review time. To the hard-pressed administrator, it must be one of the greatest inventions of all time.
To paraphrase a campaign slogan of a few years back, "Ask yourselves, are we better off today with the rise of standardized teaching evaluation than we were forty years ago?" Obviously, as this symposium has made clear, lots of people would say "yes." A pre-modern, chaotic, impressionistic, and often inconclusive process has been organized and "upgraded" thanks to modern technology. Everybody seems to be happier, from deans who now get simple, clear-cut reports to students who now enjoy pay-back time. It is a system that manages away all sorts of difficult-to-solve problems yet is flexible enough to adapt to special needs.
Our emphasis, however, has been on the dark side. A price has been paid for all these benefits. Certain people will never get rewarded for great teaching. An instructor who profoundly transforms a few students each year but who may antagonize the multitude will never receive proper recognition. A certain number of half-witted fools will receive undeserved teaching awards. Because there is no longer any need to go beyond the simple numbers, brilliant readings lists and great examination questions will be appreciated only by their creators. Many will be more motivated to work the numbers rather than take risks or challenge students. Costs such as these will always be invisible.
There are also the costs of surrendered control. The meaning of classroom performance is now no longer in professors' hands or even in the hands of their students. Remember, professors do not choose the evaluative criteria or what constitutes a good job. The goal of enlightening a half-dozen students a year is not part of the input. Professors will be judged by people who do not know them, do not know their subject, and have no idea of what they want to accomplish. It is not Consumer Reports rating microwave ovens; it is the engineers of Consumer Reports rating poetry. Judgments on what is good or bad teaching are now made by technicians, not people versed in the subject.
Perhaps most serious, the door has been opened to greater political control. The teaching evaluation form is a great way of monitoring political orthodoxy. Instructors who would refuse entry to the local political commissar are defenseless against the MOT. Nonparticipation itself could be taken as evidence of guilt. Old-timers will tell wide-eyed novice assistant professors about not having to worry about MOT directives on "unbalanced" teaching. A few may even pine for ancient times when deciding who was and was not a good teacher was a hopeless mess.
Although the purpose of this essay is not to criticize the well-intentioned people of the MOT, it is remarkable how little they seemed to be concerned with the validity of their instruments. Books addressed to professionals in teacher evaluations typically devote a few pages to this critical topic. A few citations of some very limited studies once and for all "solve" the question of whether highly rated teachers are really good teachers. The current state-of-the-art is well illustrated by a chapter called "Measures of Student Learning," by John L.D. Clark, a senior examiner at the prestigious Educational Testing Service (Centra 1980). The chapter consists entirely of what ought to be done in the future. No mention is made of existing measures of student learning beyond the self-evident such as tests. From the statistical process point-of-view, there is little to correlate with student evaluation scores. It is almost as if the identity between what is measured and what exists is so strong that only a perfunctory reference is necessary. This, of course, is to be expected: for a MOT functionary to argue against the validity of the standardized assessment devices would be to argue one's own job and profession.[Back]
Analysis here draws on the work of Ginsberg (1986). Ginsberg argues that the modern poll domesticates public opinion by transforming it from a spontaneous activity dominated by intense opinion to a passive activity in which "soft" views prevail. By choosing when questions are asked and the specific form of the questions, those who control polls control public opinion. Public actions are replaced by interpretations of questionnaire findings.[Back]
This analysis is drawn from the summary provided by Braskamp, Brandenburg, and Ory (1984).[Back]
Bloom, B.S., et al. (1956). Taxonomy of Education Objectives, Handbook 1: Cognitive Domain. New York: Longmans Green.
Braskamp, L.A., Brandenburg, D.C., & Ory, J.C. (1984). Evaluating Teaching Effectiveness: A Practical Guide. Beverly Hills, Calif.: Sage Publications.
Cashin, W.E. (1988). Student Ratings of Teaching: A Summary of the Research. Manhattan, Kans.: Center for Faculty Evaluation and Development.
Centra, John A. (1980). Determining Faculty Effectiveness. San Francisco: Jossey-Bass.
Crooks, T.J., & Kane, M.J.. (1981). The generalizability of student ratings of instructors: Item specificity and section effects. Review of Education Research, 15, 305-13.
Doyle, Kenneth O. (1983). Evaluating Teaching. Lexington, Mass.: Lexington Books.
Feldman, K.A. (1983). Seniority and experience of college teachers as related to evaluations they receive from students. Research in High Education, 18, 149-72.
Ginsberg, Benjamin. (1986). The Captive Public: How Mass Opinion Promotes State Power. New York: Basic Books.
Naftulin, D.H., Ware, J.E., & Donnelly, F.A. (1973). "The Doctor Fox lecture: A paradigm of education seduction." Journal of Medical Education, 48, 6230-35.
Ory, John C. (1991). Changes in Evaluating Teaching in High Education. Theory Into Practice, 30, 30-36.