Big names in statistics want to shake up much-maligned P value

Nature, Vol. 548, No. 7665. (26 July 2017), pp. 16-17,


One of scientists’ favourite statistics — the P value — should face tougher standards, say leading researchers. [Excerpt] Science is in the throes of a reproducibility crisis, and researchers, funders and publishers are increasingly worried that the scholarly literature is littered with unreliable results. Now, a group of 72 prominent researchers is targeting what they say is one cause of the problem: weak statistical standards of evidence for claiming new discoveries. [\n] In many disciplines the significance of findings is judged by ...


Statistical analysis

In Science: editorial policies (2016)


[Excerpt: Statistical analysis] Generally, authors should describe statistical methods with enough detail to enable a knowledgeable reader with access to the original data to verify the results. [::] Data pre-processing steps such as transformations, re-coding, re-scaling, normalization, truncation, and handling of below detectable level readings and outliers should be fully described; any removal or modification of data values must be fully acknowledged and justified. [::] [...] [::] The number of sampled units, N, upon which each reported statistic is based must be stated. [::] For continuous ...


Sailing from the seas of chaos into the corridor of stability: practical recommendations to increase the informational value of studies

Perspectives on psychological science : a journal of the Association for Psychological Science, Vol. 9, No. 3. (01 May 2014), pp. 278-292,


Recent events have led psychologists to acknowledge that the inherent uncertainty encapsulated in an inductive science is amplified by problematic research practices. In this article, we provide a practical introduction to recently developed statistical tools that can be used to deal with these uncertainties when performing and evaluating research. In Part 1, we discuss the importance of accurate and stable effect size estimates as well as how to design studies to reach a corridor of stability around effect size estimates. In ...


Consistent and clear reporting of results from diverse modeling techniques: the A3 method

Journal of Statistical Software, Vol. 66, No. 7. (2015),


The measurement and reporting of model error is of basic importance when constructing models. Here, a general method and an R package, A3, are presented to support the assessment and communication of the quality of a model fit along with metrics of variable importance. The presented method is accurate, robust, and adaptable to a wide range of predictive modeling algorithms. The method is described along with case studies and a usage guide. It is shown how the method can be used ...


The statistical crisis in science

American Scientist, Vol. 102, No. 6. (2014), 460,


Data-dependent analysis—a “garden of forking paths”— explains why many statistically significant comparisons don't hold up. [Excerpt] There is a growing realization that reported “statistically significant” claims in scientific publications are routinely mistaken. Researchers typically express the confidence in their data in terms of p-value: the probability that a perceived result is actually the result of random variation. The value of p (for “probability”) is a way of measuring the extent to which a data set provides evidence against a so-called null hypothesis. ...


Statistics: P values are just the tip of the iceberg

Nature, Vol. 520, No. 7549. (28 April 2015), pp. 612-612,


Ridding science of shoddy statistics will require scrutiny of every step, not merely the last one, say Jeffrey T. Leek and Roger D. Peng. [Excerpt] There is no statistic more maligned than the P value. Hundreds of papers and blogposts have been written about what some statisticians deride as 'null hypothesis significance testing' (NHST; see, for example, NHST deems whether the results of a data analysis are important on the basis of whether a summary statistic (such as a P value) ...


The extent and consequences of p-hacking in science

PLoS Biology, Vol. 13, No. 3. (13 March 2015), e1002106,


A focus on novel, confirmatory, and statistically significant results leads to substantial bias in the scientific literature. One type of bias, known as “p-hacking,” occurs when researchers collect or select data or statistical analyses until nonsignificant results become significant. Here, we use text-mining to demonstrate that p-hacking is widespread throughout science. We then illustrate how one can test for p-hacking when performing a meta-analysis and show that, while p-hacking is probably common, its effect seems to be weak relative to the ...



Basic and Applied Social Psychology, Vol. 37, No. 1. (2 January 2015), pp. 1-2,


[Excerpt] The Basic and Applied Social Psychology (BASP) 2014 Editorial emphasized that the null hypothesis significance testing procedure (NHSTP) is invalid, and thus authors would be not required to perform it (Trafimow, 2014). However, to allow authors a grace period, the Editorial stopped short of actually banning the NHSTP. The purpose of the present Editorial is to announce that the grace period is over. From now on, BASP is banning the NHSTP. With the banning of the NHSTP from BASP, what are ...


Scientific method: statistical errors

Nature, Vol. 506, No. 7487. (12 February 2014), pp. 150-152,


P values, the 'gold standard' of statistical validity, are not as reliable as many scientists assume. ...

