Grading software is used by some universities and US states to mark
exams. But manufacturers' claims that the systems can match human raters
have never been comprehensively assessed until now, says Jaison Morgan
of The Common Pool, a consultancy based Santa Monica, California.
To compare human and machine graders, Morgan and Mark Shermis of the
University of Akron, Ohio, obtained over 16,000 essays from six state
education departments. The essays covered a range of topics and had
already been marked by at least one trained human grader.
Grading software from nine manufacturers, which together cover 97% of
the US market, was used in the test. To calibrate the systems, each
looked for correlations between factors associated with good essays,
such as strong vocabulary and good grammar, and the human-assigned
score. After training, the software marked another set of essays without
access to the human-given grades.
The essay marks handed out by the machines were statistically identical
to those from the human graders, says Morgan. It is an important
finding, says Morgan, because teachers often do not assign essays
because they do not have the time to mark them. He says it should
encourage educators to use automated systems more widely.