SAS® and R

Best of Both Worlds

God particle, 5 sigma, and p-value

with 2 comments

About a week ago, CERN announced the discovery of a new sub-atomic particle that’s consistent with the properties of the elusive Higgs Boson, a.k.a. God Particle. CERN scientists say it is a 5 sigma result. It is interesting that almost all the news reports I read converted this 5 sigma to a percentage, and none seemed to be able to explain what exactly 5 sigma is. Some even mistakenly claimed that scientists “99.999% sure God Particle has been found.

Actually 5 sigma is just another way of stating the probability value, in another word, p-value. So what is p-value anyway? P-value is the probability that the data would be at least as extreme as those observed, if the null hypothesis were true.

Standard normal distribution N(0,1) has μ=0 and σ2=1. As you can see from the above graph, a little more than 2/3 of values drawn from a normal distribution are within one standard deviation (one sigma) away from the mean (red area). Approximately 95% of the values are with two sigma (1.96) of the mean. Three sigma covers about 99.7% AUC (Area under the density curve).

Five sigma? That’s about 0.9999997, which means the significance level (alpha) is 0.0000003. In short, there is the null hypothesis (no God particle), and the alternative hypothesis (God particle exists). Five sigma means that there is a very slim chance (less than one in a million) the null hypothesis is true. Note that this is not the equivalent of scientists being 99.99997% sure the alternative hypothesis is correct.

> pnorm(5)
[1] 0.999999713348428
> 1-pnorm(5)
[1] 0.000000286651571923535

Here is the code for the graph. I wrote it in a hurry, any suggestion to make it better is welcome.

my.color   <- rainbow(10)
my.symbol2 <- expression(mu)
my.axis    <- c(-6,-5,-4,-3,-2,-1,0,1,2,3,4,5,6)
my.label   <- c('-6', '-5','-4','-3','-2','-1',my.symbol2,'1','2','3','4','5','6')

x = seq(-6, 6, length = 600)
y = dnorm(x)

plot(x, y, type = "n", xlab=my.symbol, ylab=' ', axes=FALSE)

plotsigma <- function(start, end, color){
  sigmax = seq(start, end, length=100)
  sigmay = c(0, dnorm(sigmax), 0)
  sigmax = c(start, sigmax, end)
  polygon(sigmax, sigmay, col = color, border = NA)
}

for (i in 5:1){
  plotsigma(-i, i, my.color[i])
}

axis(1,at=my.axis,labels=my.label)
lines(x, y)
segments(0,0.4,0,0, col='white')
segments(5,0.2,5,0, lty=3)
text(5,0.22, expression(paste(5, sigma, sep='')))
Advertisements

Written by sasandr

July 14, 2012 at 12:08 am

Posted in Misc, Stat

Tagged with ,

2 Responses

Subscribe to comments with RSS.

  1. I also found the reporting on this interesting. Glad that you noticed that too.

    Guest

    July 16, 2012 at 2:12 pm

  2. I only just started learning R, I think the code is good. Wish the color could change a little bit though.

    R beginner

    July 23, 2012 at 7:34 am


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: