## Floating point arithmetic

One time I was trying different cut-off points for classification to define a dichotomous variable for logistic regression, and I kept getting erroneous result when I looked at the data print-out. Values which should have been set to “Yes” according to my algorithm fell into the “No” column, and I just couldn’t figure out what went wrong.

Here is a simple example to illustrate my problem. We start with a SOURCE data set with four variables X, Y, (X-Y) and a certain constant as the cut-off value.

Floating-Point Arithmetic Obs x y x - y cutoff 1 2 1.1 0.9 0.9 2 2 1.2 0.8 0.8 3 2 1.3 0.7 0.7 4 2 1.4 0.6 0.6 5 2 1.5 0.5 0.5 6 2 1.6 0.4 0.4 7 2 1.7 0.3 0.3 8 2 1.8 0.2 0.2 9 2 1.9 0.1 0.1

Now a flag variables is created to indicate if (x-y) is equal to the cut-off (1 for Yes, 0 for No).

data FLOAT; set source; /* No rounding */ flag1 = (z = cutoff); /* round() to the rescue */ flag2 = (round(z,0.1) = cutoff); run;

The print-out of dataset FLOAT shows that with round() function, flag variable is set correctly; but we get erratic result sans rounding. This is because in SAS, numeric values are represented as 64-bit floating point numbers, and rules of algebra may not apply to floating point numbers. A paper from the SAS® Institute explains this phenomena in great details. You can check it out yourself.

Without With Obs x y x - y cutoff rounding rounding 1 2 1.1 0.9 0.9 0 1 2 2 1.2 0.8 0.8 1 1 3 2 1.3 0.7 0.7 1 1 4 2 1.4 0.6 0.6 0 1 5 2 1.5 0.5 0.5 1 1 6 2 1.6 0.4 0.4 0 1 7 2 1.7 0.3 0.3 0 1 8 2 1.8 0.2 0.2 0 1 9 2 1.9 0.1 0.1 0 1

And as you see from the above example, to circumvent this problem, the quick and dirty way is using round() function to set precision before comparison.

And upon further checking, this is also an issue in R. You can see that, without rounding function, values for some of the comparisons are not returning “TRUE” even though on paper (x-y) and z might look the same.

> x <- rep(2, 9) > y <- seq(1.1, 1.9, by=0.1) > z <- seq(0.9, 0.1, by=-0.1) > x-y [1] 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 > z [1] 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 > # Without rounding > x-y == z [1] FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE > > # With rounding > round(x-y, 1) == round(z, 1) [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

So the take home lesson here is always check your calculation when you are dealing with values with decimal points, use various rounding and truncation functions to ensure decimal precision. And do remember, even though your number/data is continuous, computer only recognizes 0 and 1.

我最近碰到过同样情况，也是抓破头都想不出来哪儿出错了，主要是不知道什么时候会出现这个问题。

xiaogaoJuly 9, 2012 at 11:03 pm

Always be defensive. 🙂

sasandrJuly 12, 2012 at 2:02 pm

I like the way you post code. How do you post data, and code on your wordpress blog?

HGJuly 10, 2012 at 10:42 am

Check out the sourcecode tag.

sasandrJuly 12, 2012 at 2:02 pm