Thinking about Teaching Evaluations

Patty deWinstanley, Associate Professor of Psychology, prepared these points for the Committee on Teaching in May 2000, based on her reading of the extensive literature on teaching evaluations. She focused predominantly on three literature reviews cited below.

Cashin, W. E. (1995). Student Ratings of Teaching: The Research Revisited. Center for Faculty Evaluation and Development, Idea Paper no.32.

Aleamoni, L.M. (1999). Student rating myths versus research facts from 1924 to 1988. Journal of Personnel Evaluation in Education, 13(2), 153-166. (Provided to COT in Spring, 2000)

Pratt, D. D. (1997). Reconceptualizing the evaluation of teaching in higher education. Higher Education, 34, 23-44.

1. Students can make reliable and valid judgements about an instructor and certain aspects of instruction.

A. Reliability

Just as you would throw away a bathroom scale that gave you a different measure of your weight every time that you stepped on it, an evaluation form with low reliability also should be thrown away. Fortunately, under best case scenarios, student evaluation forms have been shown to be reliable.

Reliability refers to the consistency, stability, and replicability of a measurement.

The consistency of student evaluations refers to the extent that students within the same class give similar ratings on a given question. Good consistency is achievable with class sizes greater than 30. Class sizes of 10 or fewer students will probably not produce adequate consistency.

The stability of student evaluations refers to the agreement among raters over time. Student evaluations tend to be fairly stable. Thus, one can expect to see good agreement between ratings at the end of the semester and ratings given by those same students years after graduation. Our institution spends a lot of time and effort surveying graduates about teaching effectiveness for tenure decisions. The literature suggests that little if any new information is obtained because of the high stability levels of student evaluations.

The replicability of student evaluations refers to the extent that the same instructor is rated the same for the same course over a number of semesters and for all his or her courses. Replicability is high for both the same course over a number of semesters and for different courses taught by the same instructor.

Cashin (1995) provides the following guidelines for assuring that acceptable levels of reliability are achieved for student evaluations when making personnel decisions.

1. Reliability will be achieved only to the extent that the surveys are well designed, thus forms should be developed in consultation with someone knowledgeable about educational measurement.

2. Reliability will be achieved when using "ratings from a variety of courses, for two or more courses from every term for at least two years, totaling at least five courses." If there are fewer than 15-20 students in any class, data from additional classes are recommended.

Aleamoni (1999) echoes Cashin's suggestions and further emphasizes the importance of consultation in the construction of the evaluation forms: "It should be noted, however, that wherever student rating forms are not carefully constructed with the aid of professionals, as in the case of most student- and faculty- generated forms, the reliabilities may be so low as to negate completely the evaluation effect and its results".

B. Validity

Although you might not throw away a scale that always reported your weight at ten pounds lighter than every other scale that you have stepped on, you would know that the scale isn't a valid measure of your weight. A scale can be highly reliable (always giving you the same weight) but not valid (the weight is really ten pounds under your actual weight). Student evaluations can also be reliable (in the ways mentioned), but not valid. That is, student evaluations might not measure "effective teaching."

Validity refers to the extent that student evaluations measure what we want them to measure, that is, good teaching. There are several studies reported in the literature indicating that student evaluations can be valid measures of some aspects of teaching effectiveness. To illustrate, student ratings have been found to correlate with final exam performance, instructor's self-ratings, ratings of colleagues, and ratings of administrators. In addition, numerical ratings tend to correlate well with student comments on open-end questions.

2. Some variables that are unrelated to teaching effectiveness do correlate with student evaluations. In addition, some variables that have been purported to correlate with student ratings do not.

When considering student evaluations as part of a personnel evaluation, the variables that are unrelated to teaching effectiveness but do correlate with student evaluations should be taken into consideration. The variables listed below as correlating with student evaluations are the ones for which a consistent pattern based on many studies has been obtained.

A. Elective courses are rated higher than required courses.

B. More advanced students give higher ratings than less advanced students.

C. Grades are weakly correlated with student ratings: Higher grades are associated with somewhat higher ratings.

D. Humanities courses receive higher ratings than social science courses, and social science courses receive higher ratings than science courses.

The variables listed below are the ones that many people believe are correlated with student ratings, but for which inconsistent results have been found.

A. Size of the class (although, keep in mind the issue of reliability when class size falls below 15).

B. Gender of the student

C. Gender of the instructor

D. An interaction between gender of the student and gender of the instructor

E. Time of day that the course is offered.

F. Whether students are majors or non-majors.

G. Rank of instructor

Information regarding the type of variables that have an impact on student evaluations must be kept in mind when comparing evaluations from different courses. At the very least, department heads, council, and deans should be aware of the impact of variables on student evaluations that we do not think are also important to teaching effectiveness. Furthermore, the information provided to the persons making personnel decisions must be periodically updated. The research on student evaluations is very active. More researchers are beginning to investigate the interactions of several variables on student evaluations. To insure appropriate interpretations of the evaluations, up-to-date information must be provided.

3. Student evaluations are multidimensional. Contrary to some people's perceptions, student evaluations are not simply measuring popularity. Most researchers show that at least six dimensions, or factors, are commonly found in student rating forms. Below is a list of the factors. Any student evaluation form must have a few questions dedicated to assessing each of the six factors.

A. Course Organization

B. Clarity, communication skills

C. Teacher/student interaction, or rapport

D. Course difficulty, workload

E. Grading and examinations

F. Student self-rated learning

All authors of the review articles cautioned that a single overall (or general) measure of teaching effectiveness is inadequate because single items are not reliable or valid. Futhermore, single items, such as in general how would you rate this teacher's effectiveness, tend to correlate with many more of the factors that are unrelated to teaching effectiveness (i.e., gender, class size, etc.)

4. All authors of the review articles state that student evaluations must be used in conjunction with other methods of evaluating teaching. Pratt (1997) lists seven principles for evaluating teachers in a broader approach that includes student evaluations as only one aspect of teaching evaluations.

The seven principles are as follows:

A. Evaluation should acknowledge and respect diversity in actions, intentions, and beliefs.

B. Evaluation should involve multiple and credible sources of data.

C. Evaluation should assess substantive, as well as technical, aspects of teaching.

D. Evaluations should consider planning, implementation, and results of teaching.

E. Evaluation should contribute to the improvement of teaching.

F. Evaluation should be done in consultation with key individuals responsible for taking data and recommendations forward within an institution.

Comments to: Patty.deWinstanley or COT via Steve Volk