Deborah Mayo (Virginia Tech)
This paper discusses the 2011-2015 Reproducibilty Project, an attempt to replicate published statistically significant results in psychology. We set out key elements of significance tests, often misunderstood. While intended to bound the probabilities of erroneous interpretations of data, this error control is nullified by cherry-picking, multiple testing, and other biasing selection effects. However, the reason to question the resulting inference is not a matter of poor long-run error rates, but rather that it has not been well-tested by these data. This provides a rationale never made clear by significance testers as to the inferential relevance of error probabilities.