James Rovira

"Every thing that lives is holy…"

Evaluating Course Evaluations

There’s a recent interesting study out of UC Berkeley evaluating the validity of student course evaluations in measuring teaching effectiveness. The results are similar to the results of the many other studies conducted in the past: student course evaluations are not reliable indicators of teacher effectiveness: Student ratings of teaching have been used, studied, and…

Student ratings of teaching have been used, studied, and debated for almost a century. This article examines student ratings of teaching from a statistical perspective. The common practice of relying on averages of student teaching evaluation scores as the primary measure of teaching effectiveness for promotion and tenure decisions should be abandoned for substantive and statistical reasons: There is strong evidence that student responses to questions of “effectiveness” do not measure teaching effectiveness. Response rates and response variability matter. And comparing averages of categorical responses, even if the categories are represented by numbers, makes little sense. Student ratings of teaching are valuable when they ask the right questions, report response rates and score distributions, and are balanced by a variety of other sources and methods to evaluate teaching.

What do student course evaluations measure, then? The authors of this study summarize the findings of previous studies here:

Student teaching evaluation scores are highly correlated with students’ grade expectations (Marsh and Cooper 1980; Short et al. 2012; Worthington 2002). WHAT THIS MEANS:
- If you’re an instructor and want high course evaluations, pass out As like candy.
- Adjunct instructors, having the least job security and the most job retention anxiety, are most likely to inflate grades to get high course evaluations.
- Net result: over-reliance on adjunct instructors and on student course evaluations to evaluate teachers leads to grade inflation and low course rigor; i.e., poor educational quality.
Effectiveness scores and enjoyment scores are related. In a pilot of online course evaluations in the UC Berkeley Department of Statistics in Fall 2012, among the 1486 students who rated the instructor’s overall effectiveness and their enjoyment of the course on a 7-point scale, the correlation between instructor effectiveness and course enjoyment was 0.75, and the correlation between course effectiveness and course enjoyment was 0.8.
- WHAT THIS MEANS: If students enjoyed the course, they will rate it highly. But enjoyment by itself isn’t a measure of learning. The instructor may just be a good performer.
- Conversely, lack of enjoyment doesn’t mean the student didn’t learn. The types of assessments and activities that promote long term retention, in fact, lead to low course evaluations. The practices that students like the least actually help them learn and retain the most. See the link right above.
Students’ ratings of instructors can be predicted from the students’ reaction to 30 seconds of silent video of the instructor: first impressions may dictate end-of-course evaluation scores, and physical attractiveness matters (Ambady and Rosenthal 1993).
- WHAT THIS MEANS: student course evaluations are, more than anything else, superficial measures of instructor popularity.
Gender, ethnicity, and the instructor’s age matter (Anderson and Miller 1997; Basow 1995; Cramer and Alexitch 2000; Marsh and Dunkin 1992; Wachtel 1998; Weinberg et al. 2007; Worthington 2002).
- WHAT THIS MEANS: student course evaluations are, more than anything else, racist, elitist, ageist, and sexist superficial measures of instructor popularity.

So how do we rate teaching effectiveness? I’d recommend the following:

Worry less about evaluating the teacher for promotion and focus on gauging effectiveness for the sake of seeking out the most effective strategies for that specific student population.
Rely in part on peer evaluations — teachers in the field conducting this evaluation. Field specific knowledge matters, as teaching isn’t just a matter of technique, but of careful selection of content.
We still do want to hear from students, of course, so use course evaluation tools that focus on teaching effectiveness, such as those provided by the IDEA Center.

Just for the record, I’m an engaging instructor who generally gets high course evaluations, so I’m not worried about myself here. I am, however, worried about how effectively students are being educated. Reliance on student course evaluations, at present, is working against educational quality.

You can read the study below:

View this document on Scribd