|When Dutch researchers developed an open-source algorithm designed to
flag statistical errors in psychology papers, it received mixed reaction
from the research community, especially after the free tool was run on
tens of thousands of papers and the results were posted online. But
their tool, named statcheck, actually gets it right in more than 95% of
cases, its developers claim in a study.
Statcheck, developed in 2015, scours papers for data reported in the
standard format prescribed by the American Psychological Association and
uses them to calculate the p-value, a controversial but widely used
measure of statistical significance. If this p-value differs from the
one reported by the researchers, it is flagged as an 'inconsistency'. If
the reported p-value is below the commonly used threshold of 0.05 and
statcheck's figure isn't, or vice versa, it is labelled a 'gross
inconsistency' that may call into question the conclusions.
In a 2015 study, the team ran statcheck on more than 30,000 psychology
papers and found that half contained at least one statistical
inconsistency, and one in eight had a gross inconsistency. Last year,
researchers analysed just under 700,000 results reported in more than
50,000 psychology studies using statcheck, and had the results
automatically posted on the postpublication peer-review site PubPeer,
with email notifications to the authors. Some researchers welcomed the
feedback, but the German Psychological Society (DGPs) said the postings
were causing needless reputational damage.
Whether statcheck is fair depends in part on its accuracy. For the new
paper, the team ran statcheck on 49 papers colleagues had checked for
statistical inconsistencies by hand in a paper published in 2011. They
found that the algorithm's 'true positive rate' lies between 85.3% and
100%, and its 'true negative rate' lies between 96% and 100%. Combined,
those numbers meant that statcheck gets the right answer from the
extracted results between 96.2% and 99.9% of the time.