Deconstructing Evaluations

Deconstructing An Evaluation Form

Paul Trout
English, Montana State University

"I would not want to be assessed by this form. I am not contained or can not be contained in a bubble sheet."
--A student comment on a Knapp evaluation form, 1997.

Introduction

The Knapp form, given to over 19,000 students by twenty-two departments to assess the effectiveness of classroom instruction in 960 classes at Montana State University each semester, is a pretty typical instructor evaluation form.

It asks students to rate the instructor (not the course) in eight categories: mastery of subject matter, organization of course, clarity of presentation, stimulation of interest, availability for assistance, impartiality on grades and examinations, concern for student, and overall effectiveness. These items are relatively generic, and appear on most other forms used around the country; what I say about them, then, should be of interest to Montana professors at other campuses in the system.

The form is called the "Knapp" form because Stuart Knapp introduced it when he was Vice President for Academic Affairs at MSU (1978). He brought the form with him when he came to MSU from Oregon State University. Under pressure to evaluate teaching, Knapp offered this form as one way to do it. At the time this form was first introduced, no one ever explained why the form contained these items and not others: no one claimed that these items defined effective teaching, that they related to student learning, or that they were sanctioned by the best research on pedagogy. They were simply the items on an old computer-punch card that had made its way from Oregon State to the boondocks of Montana.

When first introduced, the Knapp form was used with the same frequency as the Aleamoni form (each department may choose whatever instrument it wants, or devise its own). Then, slowly, the Knapp form was adopted by more and more departments, now accounting for about 75 per cent of evaluation forms processed by the MSU Testing Center each semester, apparently because it serves the needs of both faculty and administrators. Presumably, its popularity also reflects the widespread conviction among faculty and administrators that the form provides the "unbiased, statistically significant, valid and reliable" data that Stuart Knapp himself said were required to make fair administrative decisions about faculty retention, tenure, promotion, and merit (Knapp et al. 5; underlining in the original). For example, back in 1980 an administrator wrote to me that he preferred the Knapp form because the items on it were "not ambiguous."

At the time, this was a strange claim to make, because when the Knapp form (as it was soon to be called) was being circulated for examination and comment, someone (was it Stu Knapp?) thought that the items were ambiguous enough to require explanation: a handout was circulated that glossed each item. For example, "stimulation of interest," according to the glosser, meant "motivating students to get the job done, appropriate variation of activities"; "impartiality on grades and examinations" meant "disclosure and adherence to specific statements of expectation, clear criteria of evaluation"; and "concern for student" was said to mean "concern for how students learn & progress in the course." It seems odd that no one had sense enough at the time to recommend that the Knapp wording be dropped and replaced by the clearer wording of the glosses.

This effort to elucidate the Knapp items was intended, I suspect, to reassure skeptical faculty members that the items on this new form would be interpreted by students precisely as the faculty would want them to be. But I think faculty and students tend to interpret these items quite differently. This is hardly surprising. The items are very general, and can be interpreted differently by different respondents. They are, in essence, subjective, a problem that one expert on the wording of evaluation items believes plagues almost all evaluation forms and that renders such forms unfit for personnel decisions (1).

In this essay I analyze the items on the Knapp form in light of what experts on evaluation forms have to say about their cogency (reliability, biases, possible meanings, etc.). The upshot of this deconstruction is to dispute the notion that this form (or any other form with similar items) provides unbiased, valid and reliable data about the effectiveness of an instructor's classroom teaching. To reward and punish the instructional competence of professional educators by means of an instrument that is unfair and ambiguous is fraudulent.

I deconstruct this form not because my Knapp scores are low (I worry instead that they are too high), not because I resent or fear students, and not because I believe students are incapable of evaluating some things that occur in the classroom. I deconstruct the Knapp form because I believe that it provides, as other forms with similar items, perverse incentives for college teachers to do the wrong things as educators, and that it therefore does more to harm classroom instruction than to improve it.

Now, let's examine more closely each of the eight items on the form.

Availability for assistance

This item was glossed to mean "helpfulness for individual needs & problems." But many students probably understand the item to measure how easy it was for them to see a professor after class. As worded, the item conveys no sense of scope; the implication is that the professor should be readily available to assist students whenever they seek assistance, that is, at their convenience. I suspect this is how some students do indeed interpret the item. How often I've heard students huff, "You're never in!" which means, translated from studentese, "I stopped by your office a couple of times but you weren't in." Notice that the item does not ask students if the professor kept office hours, a more reasonable and less subjective enquiry.

This item is not only ambiguous; it is vulnerable to biasing variables. One study found that "students in larger, impersonal lecture sections are more likely to perceive their professors to be...unavailable outside of class" than students in smaller classes--regardless of the number of office hours the professor keeps (Shingles 465). William Ceci was able to raise his scores on "How accessible is the instructor outside of class time?" from 2.99 (Fall) to 4.06 (Spring) merely by being a more expressive lecturer in class--even though he kept "identical office hours both semesters and made himself equally available for student appointments and consultations over both semesters" (Williams 20). How available students think an instructor was may have a tenuous relationship to reality indeed.

An instructor's real and perceived availability is a function of the course loads, course schedules, work schedules, and social commitments of the students, and of the number of courses taught by the instructor, when the courses meet during the day, where the classrooms are located, how many students are in each class, how many committees the instructor serves on, etc. It seems inherently unfair to hold instructors responsible for things over which they have little or no control.

Most departments have policies stipulating the appropriate number of office hours each faculty member should keep. There are better ways to find out if faculty are keeping them than by asking students to respond to a broad and inherently subjective item that reveals almost nothing useful about the teacher.

Clarity of presentation

This item is also open to various subjective interpretations. Being "clear" when presenting material and giving directions to students is usually a good thing, if the clarity was not achieved by skirting the complexities and difficulties. This item also seems somewhat biased in favor of lecturers, and against instructors who do not so much present material as engage students in Socratic dialogue or facilitate group discussion.

In a provocative article, Fritz Machlup problematizes this item even more by exposing one of the most bewildering ironies of conventionally defined "good" teaching, that is, teaching that presents material to students in a clear manner. Teachers praised for their "lucid exposition," Machlup observes, sometimes produce students who perform worse than students of notoriously poor teachers. "The teacher's performance that prompts witnesses to say he is a good teacher is not always equivalent to good teaching if good teaching is what produces good learning" (376; italics in the original). Machlup plausibly hypothesizes that "good" teachers who present material "with extraordinary lucidity (so that the students hardly notice the inherent complexities of the subject and grasp easily even the most subtle interrelations among its components), gives the students a feeling of comprehension and mastery of the subject," with the result that some students conclude that they can safely neglect required or supplementary reading and direct their energies to other courses that seem harder to grasp. Ironically, an inept teacher who does not present the material clearly leaves students adrift and unable to see how things hang together. Confused, these students respond by looking for clarification in assigned readings or through study groups and discussions (378). In a nutshell, crystal clear lectures may encourage some students to skimp on out-of-class studying.

Machlup, of course, is not urging professors purposely to be unclear, but rather to design their courses to make complementary both the clear presentation of material and rigorous out-of-class study. But his analysis does suggest that even traits widely accepted as desiderata of good teaching are not always beneficial in all contexts and for all students.

Let's take Machlup's paradox a bit further: should not effective teaching, at times, disturb and confuse students, rendering their ostensibly "clear" understanding of things decidedly unclear? This may happen more in some disciplines (Philosophy) than in others (Computer Science), perhaps, but does not all education, all learning, presume cognitive dissonance? As one scholar puts it: "Cognitive developmental theory strongly suggests that there must be disequilibrium between what is known and what is to be known; for development to occur, students must experience dissonance, perceived as uncertainty, disorder, surprise, and murkiness. Thus, an instructor who receives consistently high ratings in clarity and organization may not be establishishing an educational environment which creates cognitive disequilibrium" (T. Wilson 92, note 1). Episodes of student confusion signify that education and learning are taking place.

One need not practice Socratic or gnomic pedagogy to see that "clarity of presentation" could mean different things in different contexts and may or may not be a sign of effective teaching if that is defined in terms of student learning and performance. So why measure it and not something else that is more important?

Organization of course

Judging from what students have written on many narrative evaluation forms I've read over the last few years as a member of my department's performance review committee, what students seem to be evaluating with this item is the extent to which they know at the start of the semester what will be covered in the course and during each class meeting, when exams are scheduled and assignments due, and, whether the instructor "stuck" to the syllabus. These are very important to students. As Neath advises, "follow the syllabus. Do not try to do more. End the semester cleanly. Do not miss class. Start on time, end class on time, bring extra chalk or overhead pens to class, and keep all of your lecture notes in a three-ring binder with neatly punched holes" (1365). In short, be tidy.

While none of these traits are bad, they do not capture the essence of effective teaching, any more than a clean desk would define an effective writer. Indeed, McKeachie has concluded, "most presumed essentials of good teaching, such as organization, warmth, or research ability, are not highly valid" indicators of teaching effectiveness ("Reprise" 392). And Scriven has long argued that evaluation forms should not have an item about "organization" because teaching effectiveness can be achieved without it (McKeachie "Validity" 1218).

While students may be able to tell us whether the course looked tidy to them, they are probably not the best judges of whether or not the course was effectively organized from a pedagogic point of view. Instructors represent a select group, the vast majority of whom are presumably well enough organized to have earned advanced degrees and achieved a high level of intellectual competence. Do we really need to find out if the instructors hired by the University are organized enough to actually teach? If there is some doubt as to an individual faculty member's ability to organize a course effectively, peers would be better judges of whether or not a course is well conceived and organized according to standards and practices within the department and field. This item on an evaluation form garners trivial data of little interest or use.

Mastery of subject matter

There is widespread consensus among those who study these forms that the vast majority of students are in no position to determine the degree to which the instructor knows the subject or whether the instructor "has provided a complete, current, or even adequate treatment of the subject" (Theall "On" 3; "Using" 86). Seldin, a long-time defender of SNEFs, asserts that students "should not be expected to judge whether the materials used in a course are up to date or how well the instructor knows the subject matter of the course. These judgments require professional background and are best left to the professor's colleagues" (A40). McKeachie observes that "if faculty members have doubts about an instructor's competence in the subject matter, it seems illogical for them to turn to students for such judgments" ("Reprise" 388). Marque also believes that "it is unlikely that a student would be able to accurately assess his/her instructor's general knowledge of the field." This item "must be assessed by means other than the conventional student rating scales" (848). Dan Bernstein, a professor of psychology at the University of Nebraska, also believes that students are unable to tell "if material is up to date, if we're knowledgeable, and if they are learning something" (R. Wilson A14). Machlup concludes, "students cannot be judges of the content of what they are taught" (377).

Research also shows that this item gives rise to a bitter irony: a study by D. H. Elliott ("Characteristics and Relationships of Various Criteria of Colleges and University Teaching," Ph. D. dissertation, Purdue University, 1949) found that the more the instructor knew about the subject, the lower students rated his or her teaching effectiveness! (McKeachie "Reprise" 392). So, from a consumer/administrator point of view, a really low score on this item may deserve a merit raise.

Let's assume that an instructor does not know the subject matter? Who should be held responsible for this situation: the instructor? Or the administrator who assigned that course in the first place? "Surely students should not be expected to be better judges of subject-matter competence than the department chairperson, or other administrative offic[i]al" (McKeachie "Reprise" 388).

Asking students again and again to judge the knowledge and professional expertise of those who are paid to teach them not only violates common sense but breeds in students a form of intellectual arrogance that all too often comes out on narrative evaluation forms (as I noted in an earlier MP article): "who gives a damn if we call it elegy or loss? Are these terms used elsewhere in lit? I've never heard of them;" "For being a 100 level class he used a lot of words that I didn't know the definition of, he took for granite that we knew the definition of a lot of words & didn't tell us what they mean;" the instructor "had a tendency to be critical on objective manners such as word choice."

This item, and perhaps the whole evaluation process itself, badly distorts the intellectual relationship that obtains between student and teacher, inherently a relationship that is between unequals (Platt 31).

Concern for students

In an earlier article, I pointed out that one of the problems with this evaluation item is that it invites impression-management manipulations that could raise an instructor's scores on other items (Trout "How" 18-19). In this essay, I want to point out another problem: that faculty and students may interpret this high-inference item quite differently, with significant consequences for the integrity of classroom instruction. First, let's look at how instructors might interpret this item and at then how students might.

Most instructors would argue that the real, fundamental measure of a true "concern for students" is the degree to which the instructor is committed to educating them and to developing their intellectual skills. This is precisely how the gloss writer interpreted the item: "concern for how students learn & progress in the course." Another evaluation form says pretty much the same thing: "Concern. Presentations made subject more understandable; implications, applications, and concern that students learn and understand subject matter were shown" (Overall 858)(2).

According to this definition, an instructor who teaches rigorously would be exemplifying a praiseworthy "concern for students" (3). But do students think that an instructor is showing "concern" for them by requiring them to work hard and reach new levels of performance? Some of them do, certainly, and I have quoted them in an earlier essay (Trout "What" 18). But many others would not (see Trout "What"; Trout "Student Anti-Intellectualism"). Such students, and their numbers are probably increasing, do not look fondly on instruction that increases their stresses and pressures.

According to Schools Speak For Themselves, high-school students want instructors who are "generous," who make "allowances," who makes students "feel clever," who are "forgiving," and who are "kind." College students (who were high-school students just a year or so ago) are not that different; according to their statements on narrative evaluation forms, they prize professors who are friendly, nice, personable, accepting, accommodating, and easy-going (4). While these are not bad traits, they can have mischievous consequences when those responsible for educating increasingly underprepared and disengaged students prioritize them as students urge them to (see Trout "What" esp. 14-15).

As Cantor writes, "students favor 'painless education' the way they favor 'painless dentistry' or 'painless medicine.' Many students prefer the least demanding forms of teaching, the ones that require the minimum of work from them or that present the least challenge to their ingrained and comfortable habits of thought" (32; see also Edmundson, Sacks, Bauer). Most students perceive instructors who accommodate this desire for "education lite" as showing greater "concern for students" than instructors who do not. Instructors who resist students' requests for less work, higher grades, etc., are apt to be perceived as downright mean spirited (5).

Some instructors may be able to challenge students, motivate them to work hard, and reward them with appropriate grades, and still receive favorable ratings. All power to them. But most of us recognize that demanding workloads do not help raise evaluation scores and may very well lower them. In an undergraduate law course, Richard Kleeberg received an overall "A" from 80 per cent of his students. Although this score delighted the dean, it troubled Kleeberg, who wondered if he had made his students "too comfortable, by not demanding enough of them." So he made his course more demanding by adding two novels, tripling the number of assignments, and incorporating three new topic areas into the course. And yes, his student ratings went down, "to where now usually about 60 per cent rate me an overall 'A.' I am pleased with this 'improvement,' for I believe it indicates that I am now pushing and challenging more of my students, which should indicate that I am now a more effective teacher" (Kleeberg B4).

Although some instructors can resist the temptation to dumb-down their courses, others will succumb to the perverse incentive to show "concern for students" by reducing grading standards and workloads, by being less attentive to errors in grammar, or more willing to suspend or ignore deadlines and requirements. Robert S. Owen, now an assistant professor of marketing at the State University of New York City at Oswego, learned the hard way how important it is accommodate student demands for education lite. Several years ago he lost his job at Bloomsburg University of Pennsylvannia when students gave his rigorous teaching mixed reviews. Now he gives multiple-choice rather than essay exams and asks students to evaluate research papers rather than to actually write their own. And he gives extra credit to students who criticize the fairness of a question on a test, because they have at least expressed interest. "The student in college is being treated as a customer in a retail environment," he says, "and I have to worry about customer complaints." "If students come to my office," he says, "I have to make sure they walk out happy" (qtd. in R. Wilson "New" A12).

Owen is not alone in bending with the winds fanned by evaluation forms. As Cantor observes, "More and more colleagues confess to me that they teach with course evaluation forms in mind, cutting back the reading assignments in their classes, for example, because they do not want the chairman of their department being told that they work their students too hard.... In using course evaluation forms, we may well end up judging faculty by the wrong criteria. In many respects, course evaluation forms measure 'niceness' rather than true pedagogical competence" (32-33).

Learning and teaching entail frustration, anxiety, disappointments, shame, pressure, sweat and tears. That's because instructors must test students, grade them, monitor them, correct them, and fail them. They must also, at times, exhort, admonish, embarrass, push, and drive them. When instructors perform these tasks responsibly, they will cause many students stress, anxiety, and psychic pain. As Michael Platt puts it, instructors are "duty bound to displease whenever it would be dishonest to do otherwise" (33). And, as Mark Edmundson puts it, there are times when instructors must be "usefully offensive" (49)(6).

Although it is reasonable to ask students to comment on specific instructional behaviors, such as whether or not the instructor kept office hours or allowed students to ask questions or make suggestions about course requirements, it is inappropriate and harmful to leave it up to students to define and then to rate how much "concern" their instructor had for them.

Impartiality on grades and examinations

"Impartial" means not partial, unprejudiced, emphasizing lack of favoritism in deciding an issue. Other synonyms are fair, just, equitable, unbiased, objective, and dispassionate. Understood this way, "impartiality" is an ethical value that should be embraced by all professional educators. Awarding grades on some basis other than student performance violates a fundamental value of education. If this item on the Knapp form were cogent, and if students were in a position to judge how well an instructor has fulfilled the ethical obligation to be "impartial," then all instructors who used objective exams and who graded on the Bell curve should, by definition, receive a perfect score of "4" on this item. And any score lower than "4" should be grounds for an enquiry into the instructor's suitability to continue in the profession.

But scores less than "4" do not trigger inquiries (about the instructor's "impartiality," at any rate) because administrators, and the rest of us, know full well that students are in no position to judge whether or not the instructor was "impartial," and that the different grades students get reflect differences in background, effort, amount learned, and so forth, not the instructor's use of different grading standards for different students (Marsh & Roche 1196). That is, everyone knows that this item indicates not whether the instructor was "impartial," but how students feel about the instructor's evaluation requirements, about the rigor of the exams, and about their grades. Because this category can and is used by some students to penalize faculty who have not inflated grades, and because that threat of punishment induces some faculty members to lower standards, this probably is the most senseless and pernicious item on the Knapp form.

In a study conducted by Robert W. Powell, evaluation scores on two evaluation items, "the instructor is fair and objective" and "tests and grades are effective and reasonable for the level of the course," were higher (4.61 and 4.46) in a class with high grades than in a class with low grades (3.32 and 3.19), although the grading criteria and most test questions were identical, and the same instructor taught the same course using the same material over the same period of time. "Thus, there is no basis to account for the difference between groups in response to questions 7 and 14 except for the students's grade itself. It is interesting to note that these two questions produced the largest differences between groups" (201-02). Powell also found that the narrative evaluation forms reflected the same difference. "In summary, the comments of the class receiving high grades are overwhelmingly positive, while those of the class with lower grades are predominantly negative. The difference in the tone of the comments is so great that it seems hard to believe that they refer to the same instructor.... Thus, it would appear that the tendency of students to write evaluative comments, and the nature of the comments, are strongly influenced by the student's grade" (203).

Both McKeachie and Shingles also find a strong relationship between the grades students receive, or expect to receive, and how they rate the grading "impartiality" of the instructor. McKeachie writes: "an instructor who is a hard grader is more likely to be rated low on the item, 'Fairness in grading'" ("Reprise" 391). Shingles writes: "The Fairness of Grading index reflects students' satisfaction or dissatisfaction with the teacher's judgment relative to their expectations as to what grade they desire and deserve" (465).

No doubt there are methodological problems with these studies--as there is with about every other study that examines student evaluation forms. But experience and common sense lend support to the notion that the stringency/leniency of an instructor's grading does affect how students rate the instructor's "impartiality on grades and examinations," and very likely other items on the evaluation form, too (Powell "Faculty" 617, 620-21). Sixty-four percent of faculty surveyed by Kolevzon believed that "student evaluation forms are responsible for lenient grading;" instructors with more experience agreed with this view more strongly (Kolevzon 208).

Moreover, there are enough credible studies out there to convince even those partial to evaluation forms that it would be wrongheaded to accept at face value the ratings instructors receive on items having to do with grading, or to use these numbers for administrative purposes.

When considering the question of faculty merit for matters of promotion, tenure or salaries,...it is advisable to control for students' perceptions of grades and work load. Student opinions as to...the perceived fairness of the grading system [is] a strong potential source of bias which may easily interfere with their evaluations of other attributes, such as the competence faculty displayed in the classroom, the effort faculty put into teaching, or the benefit of the course to their own intellectual development. Not to control for these or other attributes which may color the students' evaluations may lead to highly misleading and inaccurate conclusions. (Shingles 464)

Greenwald says, "We do not think that teaching careers should be injured when faculty...uphold strict grading standards" ("Grading" 1216). McIntyre is blunter: "Students are not qualified to judge matters requiring professional background; e.g.,...whether the instructor's evaluation procedures are appropriate and technically adequate" (McIntyre 2).

Social scientists can debate all day whether this or that study proves beyond question that an instructor can raise evaluation scores (on one item or another) by giving out higher grades (see Powell "Faculty" 621 and Greenwald "Grading" 1214 for one view, and Theall "Using" 92 for another). What seems less debatable is that many instructors believe that an instructor can improve his chances of getting "good" ratings by assigning high grades, and some of them act accordingly (see Goldman 105-06). In the words of Armour, "Most teachers are sure that, when the time comes for the rating, students are hard on hard graders." An instructor seeking tenure or merit pay is unlikely to wait for research to tell him which life preserver to grab. "Few teachers consciously decide to give high grades in order not to be hurt by the ratings, but subconsciously this event has its impact on grades" (Armour A52). Keep in mind that an instructor can still be "impartial" as I have defined this term even while recalibrating down his or her grade scale (7).

Let's say that this Knapp item seduces some faculty to give out higher grades. What's the problem with that? Well, if "some" faculty lower their grading standards, other faculty will be under pressure to do so too. Students will gravitate to more "generous" and "kind" instructors, and criticize those who seem more unreasonable and demanding. A number of studies have demonstrated that differences in ratings of instructors can be produced by giving the student either a higher or a lower grade than he or she is used to getting (Powell "Faculty" 620-21). In one study, students who thought they would receive a lower final grade than expected rated the instructor significantly lower on a number of aspects of teaching ability than students who received the grade they expected (621). In the context of already inflated grades, where more students have been led to expect higher grades, an instructor who gives out fewer A's and B's than the norm is going against her or his self-interest. So, to compete for students and evaluation scores, some (not all) instructors will lower grading standards (Kolevzon 206). Once grading has become inflated, it is almost impossible to deflate it, for even a modest increase in standards would appear draconian. So, what's the problem with grade inflation? Three things.

First, research shows that students work harder and learn more when they fear getting a low grade, and work less, and thus learn less, when they expect to get a high grade (Greenwald "No Pain" 744). According to Greenwald, "if a professor really wants students to learn, the ideal method is to scare the students into studying" ("Grading" 1209). Unfortunately, since this strict approach to grading will likely lower one's ratings on evaluation forms, the goals of effective instruction and of getting high evaluations are, in the words of Greenwald, "in direct opposition." Lenient grading is wrong and unsound because it discourages student learning and achievement.

Second, lenient grading and the grade inflation that results inevitably lead to the dumbing-down of course work. After all, once instructors habitually give out a lot of A's and B's--a few colleagues give out 80 per cent A's and B's in introductory English courses--what else can be done to mollify disgruntled students and seduce them into giving out even higher scores on evaluations (which are also subject to inflation)? Grade compression leads to student workload deflation.

And third, lenient grading and grade inflation undermine the "sorting" function of education. This sorting function is a topic that educators prefer to avoid, for fear of being branded "elitist." But there is no denying that teachers and professors must make distinctions between levels of student performance, and indeed are duty-bound to do so, since it is part of our social function to identify and develop intelligence and talent.

For a very long time, the function of sorting and selecting talent was fulfilled by high school, but now that function has shifted more to college. College instructors, who are trained to be scholars, are understandably loath to serve as judgmental parents, policemen, or "gatekeepers." Surveys reveal that a sizeable minority of faculty (40.8%) believe that teaching and evaluating roles they have to perform are "basically conflicting" (Kolevzon 208), and that they see grading as a "necessary evil" (Goulden 118). As Goldman explains, "as scholars we are temperamentally unfit for the job, but as guardians or responsible citizens we should know that the job must be done. Our Arcadian dream-like past, that image of a collegial community of scholars, has come into conflict with our newer role of judge and jury in an adversarial relationship. We are understandably uncomfortable and distressed by the dissonance, and seek release. But it is our job to identify, as best we can (and we are certainly imperfect), those who are most fit for particular occupations or social roles. The health and well-being of our society depend upon our success. We have accepted the social function of certifying competencies, so necessary in a huge and complex world" (109).

If higher education, through lenient grading practices, relinquishes its role as a gatekeeper, society must and will find another way to get the job done, perhaps assigning the job to external agencies such as corporations and post-graduation examination makers (Goldman 118; Wooldridge).

Peers are in the best position to judge if an instructor clearly stated his or her performance expectations, and used clear and reasonable criteria to evaluate student performance. All that is "gained" by this item is placing pressure on faculty to please students by giving out lots of high grades.

Stimulation of interest

As survey after survey indicate, increasing numbers of high-school graduates now enter college unprepared for, and "disengaged" from, the intellectual rigors of academic culture. It should not be surprising, then, that such students find most of their courses tedious, uninteresting, irrelevant, and boring (see Trout "What,"). No wonder these students value highly "exciting" and "enthusiastic" instructors who manage to "arouse" their interest in material that otherwise they find quite lame.

There is nothing wrong, and a lot right, with teaching enthusiastically to hold the always-in-danger-of-flagging attention of students, with stimulating them to actually take an interest in the course material. But this item may unfairly discriminate against quite effective instructors who are more laid-back, serious, formal, and dry. This is not, I think, a trivial observation, given the potential consequence of failing to sufficiently "stimulate" students. In Christopher Turner v. The President of the University of British Columbia (1993), an arbitration board noted that "while there is no question of Dr. Turner's competence as a teacher at all levels, teaching evaluations for the last several years show that his effectiveness is marred by what students perceive as excessive formality, lack of enthusiasm and dullness" (in Haskell, 5/18, 26 #16). Was this decision fair?

Yes, if a more expressive personality does in fact materially improve student learning and achievement. Then it would be quite fair to reward those instructors who were given high scores for their "stimulation of interest," and to punish those who received low scores. d'Apollonia, a respected expert on evaluation forms, argues that instructor expressiveness is not a "biasing" variable because "it exerts its influence by affecting student learning" (1204-05; see also Marsh 1193). In a review of "Dr. Fox" studies, in which the content and expressiveness of lectures were manipulated to see how they affected the way in which the audience rated the effectiveness of the lecturer, Meier found that there was a positive correlation between the expressiveness of the instructor and student achievement: "The results from the Dr. Fox studies...indicate that students may learn more from an expressive lecturer" (345). Kaufman puts well what most experienced instructors have come to believe: "there is no substitute for enthusiasm in getting students to appreciate books, art, and music that they would, in other circumstances, never choose for themselves. I have discovered that my own excitement over the material we cover invokes a similar reaction in the students--enthusiasm produces intellectual adrenaline, and intellectual adrenaline can help even the most ill-prepared student make progress through difficult material" (56). When students have been stimulated to take an interest in the material and to pay attention, they do better work, develop a better attitude toward the subject, and may want to pursue it further--an outcome sought by all instructors.

But other research suggests that the relationship between expressiveness and student performance is a bit more complex. Abrami, also a widely respected expert on evaluation forms, concludes that although "instructor expressiveness had a substantial impact on student ratings," it had "a small impact on student achievement" (qtd. in Damron "Instructor" 3; see also Marsh 1193). Another scholar who examined the relationship between stimulation of interest and student achievement found that stimulating students to take an interest in the material can be overdone, and is certainly no substitute for rigorous classroom instruction:

If an instructor concentrates on producing a high level of achievement, his students are at least as likely to take subsequent courses and are more likely to do well in those courses than they would have if his emphasis had been on arousing interest and enthusiasm in the subject. On the other hand, the instructor who concentrates on arousing interest in the subject without at the same time taking steps to ensure a high level of achievement may be doing his students a disservice in that they may elect to major in the subject, but lacking the necessary background, they may do poorly in or fail the required courses at the second-year level" (Sullivan 589). The instructor would be well advised to concentrate on student achievement and not to be overly concerned with arousing interest in the subject. (589-90)

In essence, high or low numbers on this item do not reveal anything meaningful about an instructor's classroom effectiveness if that is measured by student learning and achievement.

Instructors who are skilled at the art of impression management, or who are naturally easy-going, funny, friendly, and expressive, like the celebrated "Dr. Fox," "are likely to receive high student ratings whether or not their students have adequately mastered course materials" (Damron "Instructor" 14). High scores on this item, then, may reflect effective teaching, or the very opposite. Besides providing ambiguous if not downright vacuous results, this Knapp item may also contribute to the dumbing down of classroom instruction by seducing some instructors to elevate their high scores by emphasizing entertainment and razzle-dazzle over content, rigorous workloads, and high standards (see Edmundson 39-40; Trout "What"). Notice that the Knapp form does not contain a counterbalancing item that would reward an instructor for teaching a rigorous and challenging course.

Overall effectiveness

What this summative or global item measures is the degree to which students "like" or are "satisfied" with the course. Their overall or global impression of the instructor influences how students rate the instructor on all other categories (Naftulin 631). Since instructors are judged "effective" to the extent that they satisfy students' interests and needs, it is important to understand what those interests and needs are.

As I wrote earlier, students tend to "like" instructors who are friendly, warm, kind, generous, understanding, flexible, tolerant, easy-going, nurturing, enthusiastic, entertaining, and charismatic. They also prize instructors, whom they perceive to be prepared, interested in teaching, organized, and helpful. More pragmatically, they want instructors who explain the content to be covered and the requirements to be met (examinations, papers, etc.), who explain assignments clearly, who are audible, and who give examinations that are directly related to material covered in lectures (McKeachie "Reprise" 395; Sheehan 393; Marques et al. 841). While there is nothing wrong with these traits or practices, few of them have been identified as essential to effective teaching if that is defined as teaching that improves the learning and performance of students.

Problems arise not with what students "like" but with what they don't like. They do not like instructors or courses that require a lot of work, that are "too demanding" or that require too much memorization or "extensive thought" (Dressel 345-46). A study that attempted to identify the kinds of teaching students evaluated most highly found that "challenging assignments" and "made the course difficult enough to be interesting" ranked far down the list of predictors (Sheehan 392).

But if there is one set of teaching behaviors that does seem strongly related to effective instruction, it is the set of behaviors related to course difficulty, stress, overload, work-load, and task-orientation. These terms refer to the pressure and demands put on students by the instructor. As Sullivan puts it, "positive answers to items associated with 'task orientation,' for example, 'expects too much from students' and 'lectures present too much material' were not related to favorable evaluations but were associated with a high level of student achievement" (588). According to McKeachie, "many students prefer teaching that enables them to listen passively--teaching that organizes the subject matter for them and that prepares them well for tests." But cognitive and motivational research shows that students retain more, think better, and are more highly motivated when they are "more actively involved in talking, writing, and doing. Thus, some teachers get high ratings for teaching in less than ideal ways" (McKeachie "Validity" 1219).

There is another problem with this global item: it is particularly sensitive to how students feel about their grades. As Powell explains, "students rate instructors on the basis of a global impression which they form ('liking').... The present findings show that this impression...are strongly influenced by the grade the student receives from the instructor" (Powell "Grades" 200-01). Shingles agrees: "Students' satisfaction with the grading procedures of the professor is the only factor to have a strong, consistent relationship with all three dimensions and the overall rating; the more satisfied students are with their grades and the grading process, the more favorably they view their teachers. This one issue explains approximately 40% of the variance in the overall rating, suggesting that students' evaluations of teaching effectiveness are strongly influenced by how well they perceived they in turn are being evaluated by the instructor" (465).

This global item, then, does not measure in any meaningful, cogent way the effectiveness of the instructor, but rather the extent to which students think the instructor was effective, from their point of view and in terms of their desires. A self-proclaimed "proponent" of ratings has come to the same conclusion: "ratings are undeniably a measure of the satisfaction of learners with their learning experience...more than they are a direct or absolute measure of the total quality of instruction" (Theall "On" 3). Most forms are not measures of effective classroom teaching but student satisfaction surveys, though few administrators are honest enough to call them that.

Because many students are "satisfied" with something less than the best education their minds are capable of handling, this item, as with other items on the Knapp form, may do more to undermine classroom instruction than to improve it, because it invites instructors who want high summative evaluation scores to please and appease students. One way to do this is by lowering standards and workloads. A high score on this item, then, should raise suspicions, because it could indicate that the instructor is not truly promoting effective student learning. Because this item could contribute to the dumbing down of classroom instruction, some experts on evaluations have argued that global items--loved by administrators looking for a simple, bottom-line number--should be excluded from evaluation forms (Marsh).

Conclusion

Made up of ambiguous and/or trivial items, subject to all kinds of biasing variables inimical to sound teaching, unvalidated yet used to help determine retention, tenure, promotion, and merit pay, the Knapp form, as an instrument for measuring effective classroom instruction, is a joke. A bad joke. On instructors, students, and Montana taxpayers.

How did this joke come to be taken so seriously? Well, when the Knapp form was introduced in the '70s, there was a national movement to have students evaluate their instructors. It seemed the decent and educationally correct thing to do. To resist this trend seemed churlish and undemocratic. At the time, few foresaw how powerful SNEFs would become, or how they might compromise the integrity and rigor of classroom instruction.

But this really doesn't explain why administrators, faculty and students at MSU continue to engage in an enterprise so obviously harmful to classroom instruction. To understand this, one must understand how the Knapp form serves the needs of these three constituents.

Most students accept the Knapp form because it is indeed a student satisfaction survey. It gives them a chance to settle scores with "uncomfortable," "overly demanding instructors." They recognize that this form allows them to pressure instructors to go easy, give out high grades, and make things generally "comfortable" for easily stressed students (See Trout "What").

Administrators are happy with the Knapp form for a couple of reasons. Borrowed from another school, it was cheap to institutionalize, since no one on the faculty or in the administration demanded that it undergo validation, an expensive process. Since so many departments have accepted the form, administrators have no incentive to abandon it, especially since it serves to reassure taxpayers, regents and legislators that teaching is evaluated and "good" teaching encouraged and rewarded. A more cogent form would be expensive to develop, and perhaps to administer.

And administrators like the Knapp form for the same reason students do: it is sensitive to students likes and dislikes. Administrators want the students, and their tuition-paying parents, to be happy; happy students are easier to retain and deal with. What makes most students happy are friendly, entertaining instructors with undemanding requirements and standards. The Knapp form, because it measures student satisfaction, is the perfect device for inducing instructors to comply with student demands (8). This is why the Knapp form (and almost every other evaluation form) does not ask students to rate whether the course was demanding, whether the assignments challenging, the tests difficult and probing, the standards high (all aspects of instruction linked to increased student learning; see Sheehan 393). Such items could provide instructors with an incentive to put more academic pressure on students, which most students would not appreciate. In short, administrators want the Knapp for because it induces faculty to be more concerned about pleasing their increasingly under-prepared and intellectually disengaged "customers" than with challenging or otherwise discomfiting them (see Platt 446-47; Hocutt 61; Haskell, 5.6 10; Damron "Three" 19; Damron "Instructor" 7).

The Knapp form even serves the interests of the faculty. Most instructors figure out how to manage their behavior to maximize their scores, or at least to avoid disastrously low ones. Instructors adept at manipulating the form are able to believe that their high scores indicate that they taught well. And instructors who don't score well can blame it on a patently flawed form! Once a critical mass of instructors figure out how to manipulate the form and spin the results, there is no momentum to adopt a new, and more valid, form.

And some faculty members may not find the consequences of using the Knapp form nearly as "harmful" as I claim. They may be quite willing to acquiesce to student/administrative pressures for education lite, not only because acquiescence raises evaluation scores but because less work for the students usually means less work for instructors, too. In the context of paltry wages and frozen raises, how many instructors really want to work harder?

Though vacuous and mischievous, the Knapp form is widely accepted at Montana State University because, in essence, it serves the self-interest of students, administrators, and instructors.

Perhaps this explains why the halls don't resound with laughter every time this form is handed out.

Notes

Tagamori's research supports the claim by critics of evaluations that the items on them are ambiguously phrased and/or subject to the student's interpretation. He found that "more than 90 percent of the instrument contained evaluation items that were ambiguous, unclear, or vague; 76 percent contained subjectively stated items, and over 90 percent contained evaluation items that did not correlate with classroom teaching behavior. Almost three-fifths of the evaluation instruments contained skewed, ambiguous, or unclear responses to evaluation items, or responses that did not correspond to the question." He concludes by saying that this finding "raises serious doubts about the clarity of written evaluation items and their applicability to 'accepted aspects of good teaching' used in many instruments administered by American colleges and universities" (Tagomori & Bishop 74, 75; see also Peterson 10).
This definition does not exclude other ways of showing concern, however. Most experienced instructors would also say that this item measures the degree to which the instructor has maintained congenial relationships with students, was sensitive to their requests and needs, treated them fairly and impartially, and behaved courteously by keeping appointments, listening to their opinions, and bending regulations and requirements in emergency situations--values and behaviors that almost all instructors embrace and attempt to exemplify.
Here's a recent anecdote that embodies precisely this kind of concern. Daniel Kaufman was teaching a humanities course to minority students when the students protested the rigor of the course by collectively skipping class one day. He prepared a short speech for the next session. "I explained that the reason my course is so difficult is that I respect my students as human beings and think that they are just as worthy of a quality education as rich white kids at Harvard. To give them kid-glove treatment would be to 'dis' them, insofar as it would imply that they were inferior and incapable of handling the kind of work that white people can do. But respect entails responsibility, and I laid out precisely what my responsibilities to them were.... Then I went over their responsibilities to me and to themselves to work to the best of their ability and to come to class and conduct themselves with civility and dignity" (57). The students knuckled down to studying Plato, and the course ended happily and successfully.
Although students find the instructor's "warmth" to be crucial, McKeachie contends that this attribute (as well as other "presumed essentials of good teaching") is not highly related to student achievement or highly valid (McKeachie "Reprise" 392).
Here is a telling quotation from a student that captures the conflict within those students who realize that their tough prof, alas, did make a difference: "With a huge grudge in my heart I have to thank him for all he has taught me" (in Brown 1). Will these students rate the instructor's "concern" high, or low? And what would administrators learn about the quality of the instructor's classroom teaching from the number?
Some SNEFs explicitly ask students to measure the extent to which the teacher was "friendly." "It is singularly unhelpful to learn that a group of students believes an instructor is (or is not) friendly; good teachers are not necessarily overly friendly to students" (Dressel 346).
"When I informed one administrator that I was going to change my grading system from my already drastically modified bell curve to the campus norm of giving mostly A's and B's, because I just could no longer 'compete for students in my courses, he soberly looked me straight in the eye and firmly said, 'That's a good idea'" (Haskell 5.6 27 #31). "Every failing grade costs the school money, as more students drop out or flunk out" (Berman 43).
See Note 7. Here's more: "I have been teaching for twenty years, and received good student evaluations and have earned tenure twice at institutions that rely heavily on SEF. I recently experienced my first student 'revolt.' The students were under pressure not to get two D's. My course was the last course before they were formally admitted into the major. I received no support from administration, including the department chair. Moreover some of the vocational faculty..., making it known to the students, complained to administration about my course" (Haskell, 5.6, 25, #18). "Rather than unfettered excellence in post secondary education, the overarching institutional agenda revealed by such practices is classroom marketability, elevated enrollments, and very high consumer satisfaction..." (Damron "Instructor" 6).

Works Cited

Armour, Robert. "What Do They Expect of Me?" The Chronicle of Higher Education 15 October 1979: A52.

Bauer, Henry H. "The New Generations: Students Who Don't Study." An unpublished paper delivered at the annual meeting of AOAC International, 10 September 1996.

Berman, Robert. "Making Up the Grade: Notes from the Antiversity." Academic Questions 11.2 (Spring 1998): 38-52.

Cantor, Paul A. "It's Not the Tenure, It's the Radicalism." Academic Questions 11.1 (Winter 1997-98): 28-36.

Damron, John C. "Instructor Personality and the Politics of the Classroom." October 1996. Online Posting <john_damron@mindlink.bc.ca>.

Damron, John C. "The Three Faces of Teaching Evaluation." 1995. Online Posting <john_damron@mindlink.bc.ca>.

d'Apollonia, Sylvia, and Philip C. Abrami. "Navigating Student Ratings of Instruction." American Psychologist 52.11 (November 1997): 1198-1208.

Dressel, Paul L. Handbook of Academic Evaluation. San Franscisco: Jossey-Bass, 1976; especially Chapter 15 "Faculty" 331-375.

Edmundson, Mark. "On the Uses of A Liberal Education." Harper's Magazine (September 1997): 39-49.

Goldman, Louis. "The Betrayal of the Gatekeepers: Grade Inflation." JGE: The Journal of General Education 37.2 (1985): 97-121.

Goulden, Nancy Rost, and Charles J. G. Griffin. "The Meaning of Grades Based on Faculty and Student Metaphors." Communication Education 44.2 (April 1995): 110-125.

Greenwald, Anthony G., and Gerald M. Gillmore. "Grading Leniency Is a Removable Contaminant of Student Ratings." American Psychologist 52.11 (November 1997): 1209-1217.

Greenwald, Anthony G., and Gerald M. Gillmore. "No Pain, No Gain? The Importance of Measuring Course Workload in Student Ratings of Instruction." Journal of Educational Psychology 89.4 (1997): 743-751.

Haskell, Robert E. "Academic Freedom, Tenure, and Student Evaluation of Faculty: Galloping Polls in the 21st Century." Education Policy Analysis Archives 5.6 (12 February 1997): 1-30. Online Posting <http://olam.ed.esu.edu/apas>.

Hocutt, Max O. "De-Grading Student Evaluations: What's Wrong with Student Polls of Teaching." Academic Questions 1.4 (Autumn 1988): 55-65.

Kaufman, Daniel A. "Why Minority Kids Should Get the Classics Too (and how to do it)." Academic Questions 11.2 (Spring 1998): 53-62.

Kleeberg, Richard N. Letter to the Editor. The Chronicle of Higher Education 1 September 1993: B4.

Knapp, Stuart, Dean Drenk, William Locke, and Shannon Taylor. "Evaluation of Teaching at Montana State University: Final Report to Littly Foundation Conference on the Liberal Arts." 16 June 1987.

Kolevzon, Michael S. "Grade Inflation in Higher Education: A Comparative Study." Research in Higher Education 15.3 (1981): 195-212.

McIntyre, Charles, Warren Seibert, and Richard Owens. "Evaluation of College Teachers." Criteria for the Evaluation, Support, and Recognition of College Teachers (A Special Publication of the Fund Associates in National Project III, prepared at the Center for Research on Learning and Teaching, The University of Michigan) Number 6 (May 1977). [6 pp.]

McKeachie, Wilbert J. "Student Ratings: The Validity of Use." American Psychologist 52.11 (November 1997): 1218-1225.

McKeachie, Wilbert J. "Student Ratings of Faculty: A Reprise." Academe 65 (October 1979): 384-397.

Machlup, Fritz. "Poor Learning from Good Teachers." Academe 65 (October 1979): 376-380.

Marques, Todd E., David M. Lane, and Peter W. Dorfman. "Toward the Development of a System for Instructional Evaluation: Is There Consensus Regarding What Constitutes Effective Teaching?" Journal of Educational Psychology 71.6 (1976): 840-849.

Marsh, Herbert W., and Lawrence A. Roche. "Making Students' Evaluations of Teaching Effectiveness Effective: The Critical Issues of Validity, Bias, and Utility." American Psychologist 52.11 (November 1997): 1187-1197.

Meier, Robert, and John F. Feldhusen. "Another Look at Dr. Fox: Effect of Stated Purpose for Evaluation, Lecturer Expressiveness, and Density of Lecture Content on Student Ratings." Journal of Educational Psychology 71.3 (1979): 339-345.

Naftulin, Donald H., John E. Ware, Jr., and Frank A. Donnelly. "The Doctor Fox Lecture: A Paradigm of Educational Seduction." Journal of Medical Education 48 (July 1973): 630-635.

Neath, Ian. "How to Improve Your Teaching Evaluations Without Improving Your Teaching." Psychological Reports 78 (1996): 1363-1372.

Overall, J. U., and Herbert W. Marsh. "Midterm Feedback From Students: Its Relationship to Instructional Improvement and Students' Cognitive and Affective Outcomes." Journal of Educational Psychology 71.6 (December 1979): 856-865.

Peterson, Donovan. "Legal and Ethical Issues of Teacher Evaluation: A Research Based Approach." Educational Research Quarterly 7.4 (Winter 1983): 6-16.

Platt, Michael. "Souls Without Longing." Interpretation 18.3 (Spring 1991): 415-58.

Powell, Robert. "Faculty Rating Scale Validity: The Selling of a Myth." College English 39.5 (January 1978): 616-629.

Powell, Robert W. "Grades, Learning, and Student Evaluation of Instruction." Research in Higher Education 7 (1977): 193-205.

Sacks, Peter. Generation X Goes to College. Chicago: Open Court, 1996.

Seldin, Peter. "The Use and Abuse of Student Ratings of Professors." The Chronicle of Higher Education 21 July 1993: A40.

Sheehan, Daniel S. "On the Invalidity of Student Ratings for Administrative Personnel Decisions." Journal of Higher Education 46.6 (November/December 1975): 687-700.

Shingles, Richard D. "Faculty Ratings: Procedures for Interpreting Student Evaluations." American Educational Research Journal 14.4 (Fall 1977): 459-70.

Sullivan, Arthur M., and Graham R. Skanes. "Validity of Student Evaluation of Teaching and the Characteristics of Successful Instructors." Journal of Educational Psychology 66.4 (1974): 584-90.

Tagomori, Harry T., and Laurence A. Bishop. "Student Evaluation of Teaching: Flaws In the Instruments." The NEA Higher Education Journal 11 (1995): 63-78

Theall, Michael. "On drawing reasonable conclusions about student ratings of instruction: a reply to Haskell and to Stake." Education Policy Analysis Archives 5.8 (21 May 1997): 6.

Theall, Michael, and Jennifer Franklin. "Using Student Ratings for Teaching Improvement." Effective Practices for Improving Teaching Ed. Michael Theall and Jennifer Franklin. San Francisco: Jossey-Bass, 1991. 83-96.

Trout, Paul. "How to Improve Your Teaching Evaluation Scores Without Improving Your Teaching." The Montana Professor 7.3 (Fall 1997): 17-22.

Trout, Paul. "Student Anti-Intellectualism and the Dumbing Down of the University." The Montana Professor 7.2 (Spring 1997): 4-10.

Trout, Paul. "What Students Want: A Meditation on Course Evaluations." The Montana Professor 6.3 (Fall 1996): 12-19.

Williams, Wendy M., and Stephen J. Ceci. "'How'm I Doing?' Problems With Student Ratings of Instructors and Courses." Change September/October 1997: 13-23.

Wilson, Tom C. "Student Evaluation-of-Teaching Forms: A Critical Perspective." The Review of Higher Education 12.1 (Autumn 1988): 79-95.

Wilson, Robin. "New Research Casts Doubt on Value of Student Evaluations of Professors." The Chronicle of Higher Education 16 January 1998: A12-14.

Wilson, Robin. "Project Seeks to Help Colleges Use Peer Review to Evaluate Teaching." The Chronicle of Higher Education 15 January 1998: A14.

Wooldridge, Adrian. "A True Test: In Defense of the SAT." The New Republic 15 June 1998: 18-21.

Contents | Home