## God particle, 5 sigma, and p-value

About a week ago, CERN announced the discovery of a new sub-atomic particle that’s consistent with the properties of the elusive Higgs Boson, a.k.a. God Particle. CERN scientists say it is a 5 sigma result. It is interesting that almost all the news reports I read converted this 5 sigma to a percentage, and none seemed to be able to explain what exactly 5 sigma is. Some even mistakenly claimed that scientists “99.999% sure God Particle has been found.”

Actually 5 sigma is just another way of stating the probability value, in another word, p-value. So what is p-value anyway? P-value is the probability that the data would be at least as extreme as those observed, if the null hypothesis were true.

Standard normal distribution N(0,1) has μ=0 and σ^{2}=1. As you can see from the above graph, a little more than 2/3 of values drawn from a normal distribution are within one standard deviation (one sigma) away from the mean (red area). Approximately 95% of the values are with two sigma (1.96) of the mean. Three sigma covers about 99.7% AUC (Area under the density curve).

Five sigma? That’s about 0.9999997, which means the significance level (alpha) is 0.0000003. In short, there is the null hypothesis (no God particle), and the alternative hypothesis (God particle exists). Five sigma means that there is a very slim chance (less than one in a million) the null hypothesis is true. Note that this is not the equivalent of scientists being 99.99997% sure the alternative hypothesis is correct.

> pnorm(5) [1] 0.999999713348428 > 1-pnorm(5) [1] 0.000000286651571923535

Here is the code for the graph. I wrote it in a hurry, any suggestion to make it better is welcome.

my.color <- rainbow(10) my.symbol2 <- expression(mu) my.axis <- c(-6,-5,-4,-3,-2,-1,0,1,2,3,4,5,6) my.label <- c('-6', '-5','-4','-3','-2','-1',my.symbol2,'1','2','3','4','5','6') x = seq(-6, 6, length = 600) y = dnorm(x) plot(x, y, type = "n", xlab=my.symbol, ylab=' ', axes=FALSE) plotsigma <- function(start, end, color){ sigmax = seq(start, end, length=100) sigmay = c(0, dnorm(sigmax), 0) sigmax = c(start, sigmax, end) polygon(sigmax, sigmay, col = color, border = NA) } for (i in 5:1){ plotsigma(-i, i, my.color[i]) } axis(1,at=my.axis,labels=my.label) lines(x, y) segments(0,0.4,0,0, col='white') segments(5,0.2,5,0, lty=3) text(5,0.22, expression(paste(5, sigma, sep='')))

I also found the reporting on this interesting. Glad that you noticed that too.

GuestJuly 16, 2012 at 2:12 pm

I only just started learning R, I think the code is good. Wish the color could change a little bit though.

R beginnerJuly 23, 2012 at 7:34 am