Internet Learning Volume 3, Number 1, Spring 2014 | Page 32
Continuous Improvement of the QM Rubric and Review Processes
Differences of Review Success by
Course Discipline
Analysis of courses reviewed from 2011
through July 2013 revealed that business
courses tended to have the best outcomes.
Business courses were most likely to meet
standards in the initial review, followed by
education courses. Business courses also
had the highest total scores. Courses in the
remaining disciplines did not significantly
differ from one another.
Relationship between Faculty Developer/Instructor
of Reviewed
Course and Familiarity with the
QM Rubric
In the analyses of the 2011–2013
course reviews, courses submitted by individuals
familiar with QM had higher initial
scores than courses submitted by individuals
who were not familiar with QM (Mann–
Whitney U (N = 1,488) = 43,537, p < .001).
However, there were not total point differences
after amendment (Mann–Whitney U
(N = 1,488) = 61,900, p = .108). (The amendment
phase includes interaction with the
peer review team.)
The familiarity of faculty developers
and instructors with the QM Rubric was examined
in relation to the outcome of the initial
course review and the amended course
review (when needed). In the analysis of the
2011–2013 Rubric, the majority (93.3%) of
individuals who submitted courses for review
were familiar with the Rubric. Only 98
out of 1,492 (6.6%) of individuals stated that
they were not familiar with the Rubric.
Proportion of Rater Agreement by
Specific Standards
Measures of reliability are often given
when discussing scores such as those
assigned using the QM Rubric. The term
“reliability” refers to consistency of results.
Inter-rater reliability is a measure of
the relationship between scores assigned
by different individuals (Hogan, 2007). In
its strictest sense, however, inter-rater reliability
works under the assumption that
reviewers are randomly selected and interchangeable
(see Suen, Logan, Neisworth,
& Bagnato, 1995). This assumption is not
met in the QM’s process in which reviewers
may be selected on the basis of their previous
experiences or areas of expertise. The
measurement of interest concerning the
QM Rubric is the proportion of reviews
in which all three raters assigned the same
rating to a specific standard (i.e., all three
reviewers assessed a standard as met or not
yet met). This is different from inter-rater
reliability in that it is not an attempt at describing
unsystematic variance (see Hallgren,
2012; Liao, Hunt, & Chen, 2010); its
purpose is to provide an easily interpretable
statistic that will allow for the comparison
of specific standards for practical
purposes. Thus, in the discussion of consistency
of results of QM’s reviews, the term
proportion of rater agreement is used as it
explicitly describes the analyses performed
as opposed to inter-rater reliability, which
it technically is not.
One of the primary purposes of analyzing
proportion of rater agreement is to
identify specific standards that may require
attention to keep the Rubric reflective of
the research and fields of practice while
being workable for a team of inter-institutional,
inter-disciplinarian academic peers.
A specific standard for which reviewers
31