A carefully tamed p-value is bioinformatician's most faithful friend

Jacques Van Helden (Université d'Aix-Marseille)
Friday, June 5, 2015 - 10:30
Room Aurigny
Talk abstract: 

The p-value has recently been questioned in several publications. Halsey et al . (2015) argue that the wide sample-to-sample variability in the p-value is a major cause for the lack of reproducibility of published research. They propose to replace statistical testing by an inspection of confidence intervals around the estimated effect size. Even though the authors raise a relevant concern about the instability of statistical tests with small-sized samples, the p-value was taken as scapegoat for sins of other sources: limitation of sample sizes, misconception of the p-value, over-interpretation of the significance. The alleged fickleness of the p-value seems to boil down to a rephrasing of the well-known problem of small sample fluctuations.   Ironically, the proposed solution suffers from the same instability, since confidence intervals are computed from the same estimators (mean, standard deviation), and their extent depends on Student $t$ distribution. Thus, the proposed alternative — discounting the p-value and focusing on size effect and confidence intervals — offers no solution to the real sources of instability of the observations. Moreover, it would be of no use in bioinformatics, where a single analyse can encompass thousands, millions or billions of tests. In this article, I propose to combine several strategies to enforce the reliability and interpretability of statistical tests in the context of high-throughput data analysis: (i) inspecting p-values and derived statistics as continuous variables rather than setting an arbitrary cut-off; (ii) coupling the analysis of the actual datasets with in silico negative and positive controls; (iii) analyzing  the full empirical distributions of p-values; (iv) bootstrapping the samples. These approaches turn the usual difficulties raised by multiple testing into an advantage, by giving insight into the global properties of the datasets, thereby enabling a contextual interpretation of individual tests. I demonstrate that, when adequately treated and interpreted, p-values and derived statistics provide reliable tools to estimate not only the significance but also the robustness of the results.

Halsey,L.G., Curran-Everett,D., Vowler,S.L. and Drummond,G.B. (2015) The fickle P value generates irreproducible results. Nature Methods, 12, 179–185.
Jacques van Helden
Lab. Technological Advances for Genomics and Clinics (TAGC), INSERM Unit U1090, Aix-Marseille Université (AMU).