Are Student Ratings Unfair to Women?
by Neal Koblitz, University of Washington

Reprinted from AWM Newsletter, Vol. 20, No. 5, September-October, 1990.

In the March-April [1990] issue of the AWM Newsletter, I asked for information on whether or not student ratings tend to discriminate against women. The purpose of this article is to report briefly on the response to my query.

I was extremely pleased to receive a large number of quite varied responses. Some people wrote their general impressions and described their personal experiences. Others generously sent me reprints of papers on the subject, or gave me advice on where to look for more material. To my surprise, it turns out that quite a lot has been written on this question, but not in journals which mathematicians normally read (see the bibliography below).

I will not attempt a systematic survey of the research and opinions on the subject. For this the reader is referred to the short list of references below, which includes the papers which I found to be the most interesting (more extensive lists of papers can be found in their bibliographies). Rather, I will summarize my own conclusions based on the material that was sent to me.

A few of the letters I received and some of the early studies indicate that often women receive equal or higher student rating numbers than men. In many situations students perceive (probably correctly) that the women instructors tend to be more sensitive to their needs, more concerned and caring, and more dedicated to teaching than the male instructors (it also helps if the woman is thought to be lenient) — and as a result reward them with higher ratings. This causes some people to conclude that there is little or no discrimination against women in student ratings.

However, a more careful examination of the question shows that the reality is more complex. Note that the traits listed in the last paragraph which may lead to high ratings for women are compatible with sex-stereotyped expectations of women as “mother figures.” According to Kierstead et al. [6], “Taken as a whole, [our] results suggest that if female instructors want to obtain high student ratings, they must be not only highly competent with regard to factors directly related to teaching but also careful to act in accordance with traditional sex role expectations. In particular, … male and female instructors will earn equal student ratings for equal professional work only if the women also display stereotypically feminine behavior.”

Thus, the difficulty for women would tend to occur in cases where instructors have to adopt a “get-tough” approach. Such a situation is much more likely to arise in a math department than, for example, in psychology or sociology, because (1) mathematics departments typically are called upon to perform the role of enforcer of academic standards, with service courses acting as a “weeding out” device for the engineering and science departments, and (2) the discrepancy between students’ high school preparation and study habits and the demands of college work is especially glaring in mathematics.

If an instructor feels compelled to put students under pressure (assigning a lot of homework, giving challenging exams), then only the most serious and mature students are at all likely to respond with high ratings at the end of the course. Most students are inclined to “punish” the instructor. There is considerable evidence that the “punishment” is more severe if the instructor is female.

[According to] Susan Kay's classroom studies… male students were far more likely to give lower ratings to those female faculty perceived to be hard graders… . This finding is consistent with a series of experiments at the University of Dayton that indicated that college students of both sexes judged female authority figures who engaged in punitive behavior more harshly than they judged punitive… . ([8], p.484–485)

See also the studies by Kierstead et al. [6] and Bennett [3], which lead to similar conclusions.

Bennett, in particular, found that women will be rated highly only if they are especially accessible to the students and spend a lot of time with them, while men can receive equally high ratings while remaining more aloof. In other words, students tend to allow men but not women to spend most of their time on research and other non-teaching activities without penalizing them in the ratings: “…male instructors are judged independently of students’ personal experiences of contact and access, whereas female instructors are judged far more closely in this regard. In this sense women are negatively evaluated when they fail to meet this gender appropriate expectation…” ([3], p.177–178).

One of the most interesting studies was made in the 1970s by Ellyn Kasehak [5]. 50 male and 50 female students were given a set of descriptions of the teaching methods and practices of professors in various specialties. In the forms received by half of the students (25 males and 25 females) the professors were given names of the opposite gender from the professors in the forms received by the other half of the students. Kaschak found that the male students were biased against women, while the female students were not.

The possibility of sex discrimination is one complex and controversial aspect of the broader question of the validity of student ratings as a measure of teaching effectiveness. It would take us too far afield to discuss some of the other problems identified in the many studies that have been conducted. But it is worth noting that, generally speaking, math departments are usually put at a special disadvantage if administrators and faculty in other departments have excessive confidence in the meaning of student rating numbers and in the value of cross-department comparisons. A larger proportion of our students take courses as requirements rather than electives and view the subject as difficult. This tends to bring down math department ratings across the hoard and leads to an unjustified belief on campus that the math department has worse teachers than other departments.

People outside of the mathematical sciences often have a naive faith in the value of numbers and are less aware than we are of the pitfalls in taking raw statistics at face value.

…[S]tudent rating scales are a form of measurement and, according to American Psychological Association standards, should be accompanied by information about the meaning, interpretation, and limitations of the scores — yet most student ratings are not accompanied by such information; [in fact,] promotion and tenure decisions are usually made by an array of administrators and faculty committees who are naive about the standard criteria for measurement instruments, and hence do not know how to interpret the results or do not realize their limitations. ([9], p.88)

In practice, the treatment of student ratings by college administrations varies considerably. On the one hand, McMaster University (Hamilton, Ontario) is among the institutions that have conducted careful studies of the validity of student ratings and seem to have adopted a cautious and sophisticated approach to the subject. At the other extreme, I received letters from two different women in the mathematical sciences at a university in western Canada, complaining bitterly of the unfair and cynical way that administrators at their university are using student ratings as a weapon against the faculty, especially the female faculty.

And at the University of Arizona, the director of an office of “Instructional Research and Development” circulated a tract [1] to faculty members purporting to correct certain “myths” held by sceptics. “Myth 7” is: “Gender of the student and the instructor affect[s] student ratings.” The article proceeds to refute this “myth” by means of a highly selective and distorted citing of the literature. Of course, someone in the math department at the University of Arizona is not likely to be aware of the numerous studies that give convincing support to Myth 7 (none of which are mentioned in [1]), and so could easily be taken in by the self-serving and intellectually dishonest propaganda.

Some Conclusions

  1. Student ratings can provide valuable feedback to the instructor her/himself, but they cannot be properly understood by someone who is not familiar with the nature of the course being rated, the characteristics of the students, and the pedagogical objectives of the instructor.
  2. On the student rating forms, questions which are very specific (e.g., “promptness in correcting exams,” “availability for office hours”) are less likely to invite biased responses than questions of a general nature (“rate the instructor overall”).
  3. In certain teaching situations which are frequently encountered in math departments (especially in introductory-level courses), students tend to discriminate against women instructors on the rating forms.
  4. Math departments and administrators have an ethical and legal obligation not to base promotion and salary decisions on data which are biased against women.


  1. Aleamoni, Lawrence M. (Director, Instructional Research and Development, University of Arizona at Tucson), “Student rating: Myth vs. research fact,” Note to the Faculty, Number 16 (October 1985).
  2. Basow, Susan and Nancy Silberg, “Student evaluations of college professors: Are female and male professors rated differently?,” Journal of Educational Psychology, 79(1987), 308–314.
  3. Bennett, Sheila Kishler, “Student perceptions of and expectations for male and female instructors: Evidence relating to the question of gender bias in teaching evaluation,” Journal of Educational Psychology, 74(1982), 170–179.
  4. Hogan, R. Craig, “Review of the literature: The evaluation of teaching in higher education,” Instructional Development Centre, McMaster University, Hamilton, Ontario, 1978.
  5. Kaschak, Ellyn, “Sex bias in student evaluations of college professors,” Psychology of Women Quarterly, 2(3) (Spring 1978), 235–242.
  6. Kierstead, Diane, Patti D’Agostino, Heidi Dill, “Sex role stereotyping of college professors: Bias in students’ ratings of instructors,” Journal of Educational Psychology, 80(1988), 342–344.
  7. Kierstead, Diane et al., “Report of the course evaluation committee,” Colby College, Waterville, Maine.
  8. Martin, Elaine, “Power and authority in the classroom: Sexist stereotypes in teaching evaluations,” Journal of Women in Culture and Society, 9(1984), 482–492.
  9. Miller, Stanley N., “Student rating scales for tenure and promotion,” Improving College & University Teaching, 32(1984), 87–90.
  10. Unger, R., “Sexism in teacher evaluation: The comparability of real life to laboratory analogs,” Academic Psychology Bulletin, 1(1979), 163–171.

Copyright ©2005 Association for Women in Mathematics. All rights reserved.