Colleges Are (finally) Getting Smarter About Student Evaluations

That’s the news from the Chronicle of Higher Ed today here by Kristin Doerer (gated if off campus, some clips below) complete with a photo of some squirrelly looking economist:

Well, economists do have some experience with the misuse of metrics. From the article:

Emily Wu and Kenneth Ancell, two students at the University of Oregon, approached their honors research professor, Bill Harbaugh, a few years ago about studying the relationship between student evaluations and grade inflation. Harbaugh, a professor of economics, was enthusiastic. Wu and Ancell dived into the university’s extensive data on evaluation and transcripts, focusing on its two largest schools, journalism and business.

What they found surprised them.  Having a female instructor is correlated with higher student achievement,” Wu said, but female instructors received systematically lower course evaluations. In looking at prerequisite courses, the two researchers found a negative correlation between students’ evaluations and learning. “If you took the prerequisite class from a professor with high student teaching evaluations,” Harbaugh said, “you were likely, everything else equal, to do worse in the second class.”

The team found numerous studies with similar findings. “It replicates what many, many other people found,” said Harbaugh. “But to see it at my own university, I sort of felt like I had to do something about it.”

He did. In the spring of 2017, Harbaugh assembled a task force on the issue and invited Sierra Dawson, now associate vice provost for academic affairs, to join.

The UO Provost’s website on the reform process is here. We are piloting new surveys now and the Senate expects to have them in place by next fall. Back to the Chronicle article:

Legal Pressure

Doing nothing to revise or phase out student evaluations could be a risky proposition not just educationally, but also legally.

In August, an arbitrator ruled that Ryerson could no longer use student evaluations to gauge teaching effectiveness in promotion-­and-tenure decisions. The Ryerson Faculty Association brought the arbitration case and argued that because of the well-documented bias, student evaluations shouldn’t be used for personnel decisions.

“This is really a turning point,” said Stark, who testified on behalf of the Ryerson faculty group. He thinks the United States will see similar cases. “It’s just a question of time before there are class-­action lawsuits against universities or even whole state-university systems on behalf of women or other minorities, alleging disparate impact.” …

Tagged . Bookmark the permalink.

10 Responses to Colleges Are (finally) Getting Smarter About Student Evaluations

  1. literature says:

    Bill, can you provide a link to the research that shows evaluations are biased?

    • Fishwrapper says:

      There were two to get you started linked in the article. One is: The Role of Perceived Race and Gender in the Evaluation of College Teaching on

      Landon D. Reid, Colgate University

      The present study examined whether student evaluations of college teaching (SETs) reflected a bias predicated on the perceived race and gender of the instructor…The results of the present study are consistent with the negative racial stereotypes of racial minorities and have implications for the tenure and promotion of racial minority faculty

      The second linked study is: Student evaluations of teaching (mostly) do not measure teaching effectiveness

      Anne Boring, Kellie Ottoboni, and Philip B. Stark

      Student evaluations of teaching (SET) are widely used in academic personnel decisions as a measure of teaching
      effectiveness. We show:
      SET are biased against female instructors by an amount that is large and statistically significant.
      The bias affects how students rate even putatively objective aspects of teaching, such as how promptly assignments are graded.
      The bias varies by discipline and by student gender, among other things.
      It is not possible to adjust for the bias, because it depends on so many factors.
      SET are more sensitive to students’ gender bias and grade expectations than they are to teaching effectiveness.
      Gender biases can be large enough to cause more effective instructors to get lower SET than less effective instructors.

      I’m sure Bill has lots of other links – my quick queries on the googles suggest there has been a lot of effort put into studying this topic.

  2. literature says:

    Are those the best studies?

    The first study is based on demographic correlations from RateMyProf data. It’s not suited to detect causal relationships and almost certainly has data quality issues. It also fails to find a relationship between ratings and gender.

    The second study is based effectively on comparing the evaluations of two (!) graduate students in an online course.

    These are weak studies. I hope there are better studies if Bill is trying to revolutionize evaluations based on this literature. It would be especially helpful to show that the demographic groups that are most effective at teaching are those that are most likely to be discriminated against, as Bill asserts in the Chronicle piece.

    • Fishwrapper says:

      There is a great deal more literature; these two were linked in the article, so it seemed germane to the query. It did not appear from the article that Bill or other researchers were drawing definitive conclusions based on these two studies, but there has been an argument for some time now there is a disconnect between SET data and faculty performance and/or learning outcomes. I’ve been privy to faculty griping about evaluation processes for three decades, and the argument is not a new one.

      • response says:

        So one can influence policy these days by simply asserting there’s some ill-defined “literature” out there that supports your position? Or because an “argument” is repeated long enough?

        • uomatters says:

          I suggest you try google scholar, or just go through the references listed in the papers linked to in this post.

          • response says:

            1) I have casually looked. I haven’t seen much that is high quality.
            2) Bill, you’re the one making these claims and attempting to have an influence both at UO and nationally. It’s too much to ask you to back up your assertions? Why are you being defensive? A couple of DOIs to the studies you having in mind would take you a few minutes.

  3. oldtimer says:

    Has the old standard that evaluations of teaching should be based on multiple, convergent evidence been abandoned? Even as a numbers type person in a numbers field, I always tell my students that a number in isolation means very little. By the way, my favorite randomized studies suggest that students prefer attractive teachers of ugly ones, imagine that. Of course, these would be my favorite to put my Evals in context.

  4. Deplorable Duck says: Seriously? Isn’t there a better source of data we could use, like graffiti in dorm restrooms?

Leave a Reply

Your email address will not be published.