Student Evaluation of Teaching
A synopsis of the extensive
research literature on student evaluations of teaching follows. The points made below are predominantly
based on three literature reviews cited.
For those interested in more detail, a copy of these articles is
available at the Center for Excellence in Teaching.
Cashin, W. E. (1995).
Student Ratings of Teaching:
The Research Revisited. Center
for Faculty Evaluation and Development, Idea Paper
no.32.
Aleamoni, L.M.
(1999). Student rating
myths versus research facts from 1924 to 1988. Journal of Personnel Evaluation in Education, 13(2),
153-166. (Provided to COT
in Spring, 2000)
Pratt, D. D. (1997).
Reconceptualizing the evaluation of teaching in higher education. Higher Education,
34, 23-44.
1. Students can make reliable and valid judgements about
an instructor and certain aspects of instruction.
A. Reliability
Just as you would throw
away a bathroom scale that gave you a different measure of your weight
every time that you stepped on it, an evaluation form with low reliability
also should be thrown away. Fortunately,
under best case scenarios, student evaluation forms have been shown to
be reliable.
Reliability refers to
the consistency, stability, and replicability of a measurement.
The consistency of student
evaluations refers to the extent that students within the same class give
similar ratings on a given question.
Good consistency is achievable with class sizes greater than 30. Class sizes of 10 or fewer students will
probably not produce adequate consistency.
The stability of student
evaluations refers to the agreement among raters over time. Student evaluations tend to be fairly
stable. Thus, one can expect
to see good agreement between ratings at the end of the semester and ratings
given by those same students years after graduation. Our institution spends a lot of time and effort surveying graduates
about teaching effectiveness for tenure decisions. The literature suggests that little if
any new information is obtained because of the high stability levels of
student evaluations.
The replicability of student
evaluations refers to the extent that the same instructor is rated the
same for the same course over a number of semesters and for all his or
her courses. Replicability is high for both the same
course over a number of semesters and for different courses taught by
the same instructor.
Cashin (1995) provides
the following guidelines for assuring that acceptable levels of reliability
are achieved for student evaluations when making personnel decisions.
1. Reliability will be achieved only to the extent that
the surveys are well designed, thus forms should be developed in consultation
with someone knowledgeable about educational measurement.
2. Reliability will be achieved when using "ratings
from a variety of courses, for two or more courses from every term for
at least two years, totaling at least five courses".
3. If there are fewer than 15-20 students in any class,
data from additional classes are recommended.
Aleamoni (1999) echoes
Cashin's suggestions and further emphasizes the importance of consultation
in the construction of the evaluation forms:
"It should be noted, however, that wherever student rating
forms are not carefully constructed with the aid of professionals, as
in the case of most student- and faculty- generated forms the reliabilities
may be so low as to negate completely the evaluation effect and its results."
B. Validity
Although you might not
throw away a scale that always reported your weight at ten pounds lighter
than every other scale that you have stepped on, you would know that the
scale isn't a valid measure of your weight.
A scale can be highly reliable (always giving you the same weight)
but not valid (the weight is really ten pounds under your actual weight). Student evaluations can also be reliable
(in the ways mentioned), but not valid. That is, student evaluations might not measure "effective
teaching".
Validity refers to the
extent that student evaluations measure what we want them to measure,
that is, good teaching. There
are several studies reported in the literature indicating that student
evaluations can be valid measures of some aspects of teaching effectiveness. To illustrate, student ratings have been
found to correlate with final exam performance, instructor's self-ratings,
ratings of colleagues, and ratings of administrators. In addition, numerical ratings tend to correlate well with
student comments on open-ended questions.
2. Some variables that are unrelated to teaching effectiveness
do correlate with student evaluations. In addition, some variables that have been purported to correlate
with student ratings do not.
When considering student
evaluations as part of a personnel evaluation, the variables that are
unrelated to teaching effectiveness but do correlate with student evaluations
should be taken into consideration.
The variables listed below as correlating with student evaluations
are the ones for which a consistent pattern based on many studies has
been obtained.
A. Elective courses are rated higher than required courses.
B. More advances students give higher ratings than less
advanced students.
C. Grades are weakly correlated with student ratings:
Higher grades are associated with
somewhat higher ratings.
D. Humanity courses receive higher ratings than social science
courses, and social science courses receive higher ratings than natural
science courses.
The variables listed below
are the ones that many people believe are correlated with student ratings,
but for which inconsistent results have been found.
A. Size of the class (although, keep in mind the issue of
reliability when class size falls below 15).
B. Gender of the student
C. Gender of the instructor
D. An interaction between gender of the student and gender
of the instructor
E. Time of day that the course is offered.
F. Whether students are majors or non-majors.
G. Rank of instructor
Information regarding
the type of variables that have an impact on student evaluations must
be kept in mind when comparing evaluations from different courses. At the very least, department heads, council,
and deans should be aware of the impact of variables on student evaluations
that we do not think are also important to teaching effectiveness. Furthermore, the information provided
to the persons making personnel decisions must be periodically updated. The research on student evaluations is
very active. More researchers
are beginning to evaluation the interactions of several variables on student
evaluations. To insure appropriate
interpretations of the evaluations, up-to-date information must be provided.
3. Student evaluations are multidimensional. Contrary to some people's perceptions,
student evaluations are not simply measuring popularity. Most researchers show that at least six
dimensions, or factors, are commonly found in student rating forms. Below is a list of the factors. Any student evaluation form must have
a few questions dedicated to assessing each of the six factors.
A. Course Organization
B. Clarity, communication skills
C. Teacher/student interaction, or rapport
D. Course difficulty, workload
E. Grading and examinations
F. Student self-rated learning
All authors of the review
articles cautioned that a single overall (or general) measure of teaching
effectiveness is inadequate because single items are not reliable or valid.
Furthermore single items, such as "in general how would you
rate this teacher's effectiveness" tend to correlate with many more
of the factors that are unrelated to teaching effectiveness (i.e., gender,
class size, etc.)
4. All authors of the review articles state that student
evaluations must be used in conjunction with other methods of evaluating
teaching. Pratt (1997) lists
seven principles for evaluating teaching in a broader approach, which
includes student evaluations as only one aspect of teaching evaluations.
The seven principles are
as follows:
A. Evaluation should acknowledge and respect diversity in
actions, intentions, and beliefs.
B. Evaluation should involve multiple and credible sources
of data.
C. Evaluation should assess substantive, as well as technical,
aspects of teaching.
D. Evaluations should consider planning, implementation,
and results of teaching.
E. Evaluation should contribute to the improvement of teaching.
F. Evaluation should be done in consultation with key individuals
responsible for taking data and recommendations forward within an institution.
|