Grade Trends and the Connection to Faculty Evaluations

Larry Watson and Jeff Adams
Physics
Montana State University-Bozeman

In recent years, faculty discussions alleging a decline in the cognitive demands on students and an accompanying inflation in student grade-point averages have moved from informal coffee-room discussions to formal university forums. Often referred to as the "dumbing down" of the university, this is an important topic, which deserves focussed attention. However, many discussions on the "dumbing down truism" rely solely on personal experience or anecdotal evidence. This does not mean rigorous research is not being done on grade inflation and student attitudes toward college, but quantitative data on such questions as course difficulty and intellectual demand are inherently difficult to obtain. Further, the generalizability of such data is difficult to establish and so documented trends at one school, or even an individual department, may be of little value in informing faculty elsewhere.If one accepts that the "dumbing down" of the curriculum is indeed a feature of the modern American university, it is natural then to speculate as to the cause. The most often cited factor in fueling grade inflation is the ubiquitous faculty evaluation form. Sacks (1996) has suggested that the pressure to get good evaluations from students because of promotion and tenure ramifications encourages faculty to give high grades and make courses easier. But, does the strategy of reduced expectations and higher grades lead to higher evaluations? Citing a number of studies in the literature, Trout (1997) has argued that faculty can improve their evaluations by, among other tactics, reducing cognitive demands on students. This is an important statement both because of its potential impact on the ways that we teach and the ways that administrators use these forms in making personnel decisions. But, once again, data from other schools or departments is easily dismissed as not generalizing to a particular department or setting.

It is with these issues in mind that faculty in the Research in Physics and Astronomy Education Group in the Department of Physics at MSU-Bozeman, with fiscal support from the MSU-Bozeman Teaching/Learning Committee, undertook a study of grade trends and faculty evaluations. The two main questions addressed were:

  1. Is there a long-term trend in our department towards higher grades for large, introductory physics courses?
  2. Do the data on grades and faculty evaluations for our faculty support a positive correlation between higher grades and improved evaluation scores?

This article reports the results of this study and comments on the implications for teaching and the uses of faculty evaluations within the physics department.

The Study

This study analyzed grade and evaluation data from six physics courses at MSU-Bozeman selected based on size and the availability of data. The six courses were Introductory Astronomy, Descriptive Physics, Algebra Physics I and II, and Calculus Physics I and II. Astronomy is a highly subscribed freshman-level course taken by a wide cross section of students to satisfy a natural-science core requirement. Descriptive Physics is an overview course also satisfying the natural science core requirement. Architecture majors and pre-med students take Algebra Physics while engineering and physics majors take Calculus Physics. All of these are large, lecture-based courses typically enrolling over 100 students per section. We chose to exclude from our study courses taught by graduate students or courses taken predominately by physics majors such as upper-division classical mechanics.

We aggregated grades for these courses for the last eighteen years (1978-1996) using the average grade for all students in the class as our measure. We also assembled faculty evaluation data for as many course sections as possible. The evaluation data go back ten years and do not cover every course section in that ten-year period. The evaluations used by the physics department consist of sixteen questions marked on a Likert-scale form with space for written comments on the back. An abridged version of the evaluation instrument used by the physics department is included as Appendix A. The average score for each of the sixteen questions was recorded for every section of every year for which data were available. Written comments were not available.

Evaluations

The Physics Department's faculty evaluation form seeks information about sixteen specific aspects of teaching performance on a scale of one (good) to five (poor). For our purposes it was necessary to assign to each class a unique number indicating an overall instructor's "score." One way to do this is simply to average the scores for the sixteen questions. However, this does not address whether some questions are more important than others and therefore should be given more weight in creating an aggregate score. For example, an item about prompt grading might not be as important as one dealing with instructor preparedness. To address this issue we circulated a Teaching Performance Priority Survey among physics faculty and 350 students in Introductory Astronomy and Algebra Physics. This survey asked faculty and students to rate the relevance of the sixteen faculty evaluation questions on a scale of one to five as follows.

The results of this survey are summarized in Table 1, which lists the question numbers (see appendix A for questions) and the corresponding priority index for the faculty and the two different classes of students. Predictably, students and faculty differ strongly on which questions are important indicators of instructor performance. In fact, students assigned the highest priority index to items seven and eight, which deal with fairness and appropriateness of exams, while these were the items rated lowest by faculty. The faculty's ranking of the importance of all questions was either equal to or lower than the ranking given by students except for item number one dealing with preparedness. Students and faculty gave similar ratings to questions three (use of examples), five (opportunity to ask questions), thirteen (availability of instructor outside of class), and fourteen (like to take another course from this instructor) as good indicators of instructor performance.

To reflect these differing judgments, composite averages were created by weighting each of the individual responses to the sixteen questions by the corresponding faculty or student priority indices. Thus, for each course, we created a "faculty-weighted average" and a "student-weighted average" which, along with the raw average, could be used to compare courses. What we discovered, however, is that using the "student-weighted average" or the "faculty-weighted average" did not significantly affect the overall rankings as compared to just using the raw average. We believe there are two reasons for this lack of discrimination. First, students do not use the whole five-point scale when completing evaluation forms. This is evidenced by the fact that there are few average evaluation scores above three and the average score for all questions is slightly less than a two. It seems that students resist giving low marks to professors. (Remember that a five is a "low" mark.) Second, we suspect that students do not discriminate strongly between questions. If a student likes a class, then she or he will give generally high marks without taking the time to think carefully about each question. Given both the numbers of such forms that students are asked to complete and the fact that the forms are usually administered in the last week of classes, this should not be surprising. All faculty evaluation scores reported are raw averages.

Study Results

The first question in the study concerned grade trends in the Department of Physics. The results are summarized in Figure 1. Over the eighteen year span from 1978 to 1996, the average grade awarded (weighted by class size and reported on a four-point scale) dropped from about 2.8 to about 2.7. No individual course showed an increase in average grade awarded. Algebra Physics was the most stable and Astronomy had the greatest drop in grades over this period. One reviewer has suggested that this could be a relaxation effect largely due to grade inflation that took place previous to 1978. Whatever the cause, we can say that grades in the Department of Physics have remained stable or, if anything, have undergone a slight decline. This result does not imply that courses in physics have become harder or more intellectually demanding. We did attempt to quantify course difficulty by comparing old exams but this was problematic due to lack of records. This issue clearly has merit for future studies.

Of more interest, however, are the data showing the relationship between average class grade and average evaluation score, which is presented in Figure 2. When we examined the relationship between these two variables, we did find that higher grades are indeed correlated with better evaluations with a correlation coefficient of r=0.37. This is consistent with the data reported by other studies (see Howard and Maxwell and references therein), which are often used as evidence that instructors use grades to boost evaluation scores. While it is clear that a correlation exists, it is important to stress that a correlation does not necessarily imply a causal relationship. For instance, there is a strong correlation between the age of our current department head and the Dow Jones Industrial Average and yet no one would suggest that the relationship is one of cause and effect. It is interesting to speculate as to other reasons why course evaluation scores and average class scores might be correlated.

Grades and Instructor Evaluations

Consider the following hypothetical study. Two sections of the same class are given identical teaching methods, tests, assignments, etc., but are graded on different scales. That is, for the same quality and quantity of work, one section consistently earns higher grades than the other does. Of course, one could never execute such a study but opinions as to its likely outcome are easy to elicit. It would not surprise anyone if the section graded lower were to give the instructor a poorer evaluation than the section graded higher. This would not necessarily require students to compare their grades directly but rather could result from a comparison with their expectations based on the reputation and history of the particular course. Indeed, the data showing a positive correlation between evaluations and grades hints at just such a result. But, is it necessarily the case that students are simply rewarding instructors for easy grades? Are they simply returning a favor? No. Another explanation is that students are attempting to evaluate the quality of instruction they have received and clearly this should be related to the quality of the learning, which they could reasonable infer from their grade. After all, a grade is awarded by a professor as a measure of student learning and, given comparable student effort in two courses, the one that yields the higher grade must have been taught by the more effective instructor. Although it allows that teaching evaluations do reflect a genuine attempt by students to measure teacher quality, this interpretation does not imply that student evaluations provide an absolute measure of instructor effectiveness.

As thoroughly documented by Perry in his Forms of Intellectual and Ethical Development in the College Years (1970), students' conceptions of the nature and origins of knowledge must progress through a series of positions. This progression begins with what Perry calls basic dualism in which knowledge is perceived as a set of discrete facts handed down from authorities. A test or examination consistent with this view would rely on questions from only the lowest rungs of Bloom's Taxonomy--knowledge and understanding. To test at higher levels like integration and evaluation would be inconsistent with a dualistic epistemology and thus perceived by students as "unfair." So, students may honestly report that they will rate a course highly even if it is hard where by hard they mean that it is requires learning a lot of facts. However, when professors refer to a course as hard they more often mean that it involves higher order thinking skills which, to dualistic learners, just means "trick questions" (Zeilik, 990). This is not to suggest that all courses rated by students as providing valuable learning are necessarily factually oriented and of low cognitive demand but rather that students' evaluations on such questions as the fairness of exams and the quality of learning must not be considered in isolation.

Evaluation Design

If we assume that students are really trying to give faculty accurate feedback about teaching, which we think they are, then we should be doing all we can to help students express themselves. This means taking the faculty evaluation process seriously and designing evaluation forms that are clear and to the point. The questions should certainly elicit student opinions with questions such as "How did you enjoy this course?" or "What is your overall opinion of the instructor?" However, this should be balanced with more objective questions that seek to establish the true learning environment of the course independent of whether students think that it is appropriate. Instead of asking, "Did you think that the exams were fair?" we should be asking "Did exams require you to integrate concepts from different parts of the course and apply them to a new situation?" Scores for such questions could not be ranked but rather would have to be measured against each instructor's objectives. In addition to providing valuable information to instructors and department heads, such questions would also send a message to students that teaching evaluations are about more than just popularity.

Conclusion

We have found that, despite the popular myth to the contrary, the grades in our department have not undergone any inflation in the last eighteen years. This result was as much of a surprise to us as it was to the rest of the faculty. It also serves to illustrate the danger of making assumptions that have not been substantiated. This is a study that we suggest should be carried out by all departments interested in this important debate so that discussions about grade inflation can be based on hard data for that department not on common myth. We have also discovered that our data support other studies showing a positive correlation between grades and evaluations. The presentation of these data to our department's faculty was the source of lively discussion that would not have resulted by discussing the data from elsewhere. We believe, however, that the simple interpretation that students are simply rewarding faculty for easy grades is highly problematic. We propose an alternative interpretation in which students are genuinely attempting to assess good teaching by using grades as the primary measure of their own learning. While we do not dispute that instructors can elevate evaluation scores by giving out higher grades, we do not believe that this is necessarily a quid pro quo. We believe instead that students are genuinely interested in providing useful feedback but place too much emphasis on grades as a measure of teaching and learning. This interpretation recognizes basic integrity in the faculty evaluation process, but also highlights that all evaluations are in some way a reflection of the perspective of the evaluator, which must be carefully considered when assessing such evaluations. And, more importantly, such evaluation data must be one of a myriad of sources gathered and interpreted in creating an overall picture of teaching performance.

Acknowledgements

The authors would like to thank Greg Francis for assistance with data collection and interpretation and Tim Slater for carefully reviewing this manuscript. This work was supported by a grant from the MSU-Bozeman Teaching/Learning Committee.


References

Howard, George S., and Scott E. Maxwell. "Correlation Between Student Satisfaction and Grades: A Case of Mistaken Causation?" Journal of Educational Psychology 72.6 (1980): 810-820.

Perry, William G. Forms of Intellectual and Ethical Development in the College Years: A Scheme. New York: Holt Rinehart, 1970.

Sacks, Peter. Generation X Goes to College. Chicago: Open Court, 1996.

Trout, Paul. "How to Improve Your Teaching Evaluation Scores Without Improving Your Teaching!" The Montana Professor 7.3 (Fall 1997): 17-22.

Zeilik, Mike, et al. "Conceptual astronomy: A novel model for teaching postsecondary science courses." American Journal of Physics 65.10 (1997): 987-996.

Appendix A

The MSU-Bozeman Department of Physics
Faculty Evaluation Form.

Indicate the correct responses by filling in the circle completely.

1 = Excellent
2, 3, 4 = Average
5 = Poor

Note: Questions 10 through 16 should be answered as:

(1) yes (2) Yes most of the time (3) Yes and No (4) No most of the time (5) No

  1. Instructor's lectures were prepared and organized in a(n) ____ manner.

  2. To inform us of course objectives, the daily outline of material was ____.

  3. The instructor used frequent illustrations and examples in a(n) ____ way.

  4. A(n) ____ summary on highlights of material was given regularly.

  5. Student felt a(n) ____ opportunity to ask questions in class.

  6. Instructor's enthusiasm for the subject matter was ____.

  7. Exams covered subject matter in a(n) ____ way.

  8. Exams were a(n) ____ test of my knowledge.

  9. All things considered, I rate this instructor ____.

  10. I found this instructor personably likable. ____

  11. The rate of coverage of material was comfortable to me. ____

  12. Assignments were of the right amount for a course at this level. ____

  13. Availability for consultation outside of class. ____

  14. I would enjoy taking another course from this instructor. ____

  15. I feel the grading was fair. ____

  16. Homework and quizzes were returned promptly. ____

Figure and Table Captions

Table 1: Student and faculty priority indices for the sixteen question appearing on the faculty evaluation forms used in the Department of Physics. The results are the average on a questionnaire (see text) ranking the importance of each question on a scale from one (least important) to five (most important).

Table 1

Question
Number
Introductory
Astronomy
Algebra
Physics
Physics
Faculty
1 3.43 3.95 4.21
2 3.18 3.44 3.07
3 3.78 3.81 3.71
4 3.47 3.41 2.93
5 3.42 3.43 3.64
6 3.87 4.07 3.79
7 3.90 4.47 2.50
8 3.78 4.32 2.69
9 3.87 3.70 3.86
10 3.27 3.02 2.71
11 3.53 3.76 2.79
12 3.53 3.72 2.86
13 3.69 3.78 3.64
14 3.45 3.50 3.43
15 3.76 3.97 3.07
16 3.07 3.02 2.57

Figure 1: The eighteen-year trend in course GPA's in the Department of Physics. Each data point represents the average over six courses weighted by student enrolment.

Figure 2: The correlation between average class evaluation and course GPA. Each data point represents a single course section. The magnitude of the correlation coefficient is r=0.37.

Figs. 1&2 charts; use graphics-capable browser to view
Contents | Home