# P-values: friend or foe?

Neyman & Pearson wrote in 1928:

“it is doubtful whether the knowledge that [a P value] was really 0.03 (or 0.06), rather than 0.05…would in fact ever modify our judgment”

“The tests themselves give no final verdict, but as tools help the worker who is using them to form his final decision.”

P-values (and its derivates such as confidence intervals) are notoriously used and misused in science. In its simplest form, p-values are supposed to reflect the probability to observe a result that is equal or more extreme than what was actually observed when the null hypothesis is true. A null hypothesis refers to no significant difference between two outcomes (any such difference would be due to measurement errors or noise). There is plenty of abuse and misunderstandings of p-values in the scientific literature – so much that there is daily retractions of articles because of misuse of p-values. There are even journals that have banned all use of p-values. An important point is that p-values assumes that the test hypothesis is true, and it simply indicates how far the data is from the test hypothesis. Hence, a p-value of 0.2 would only serve to indicate that the data are closer to the statistical model than p=0.02. The original intent of p-values was to caution against overrepresentation of associations to be true effects.

Correct use of p-values requires careful interpretation of effect sizes, sample sizes, and less attention to pre-specified magic boundaries such as “p=0.05”. Study design is often more important; are there hidden confounders in the experiment or design? Do not claim that a non-significant hypothesis test supports a test hypothesis (obviously, this is more compatible with the alternative hypothesis). At present, when genomics is the leading light: do not use p-values for very large data-sets, because p-values always move to zero when the data is sufficiently big. Greenland *et al.* provide an excellent discussion around p-values that every scientist should read.