SAS® and R

Best of Both Worlds

Floating point arithmetic

with 4 comments

One time I was trying different cut-off points for classification to define a dichotomous variable for logistic regression, and I kept getting erroneous result when I looked at the data print-out. Values which should have been set to “Yes” according to my algorithm fell into the “No” column, and I just couldn’t figure out what went wrong.

Here is a simple example to illustrate my problem. We start with a SOURCE data set with four variables X, Y, (X-Y) and a certain constant as the cut-off value.

Floating-Point Arithmetic

Obs    x     y     x - y    cutoff
 1     2    1.1      0.9      0.9
 2     2    1.2      0.8      0.8
 3     2    1.3      0.7      0.7
 4     2    1.4      0.6      0.6
 5     2    1.5      0.5      0.5
 6     2    1.6      0.4      0.4
 7     2    1.7      0.3      0.3
 8     2    1.8      0.2      0.2
 9     2    1.9      0.1      0.1

Now a flag variables is created to indicate if (x-y) is equal to the cut-off (1 for Yes, 0 for No).

data FLOAT;
  set source;
  /* No rounding */
  flag1 = (z = cutoff);
  /* round() to the rescue */
  flag2 = (round(z,0.1) = cutoff);
run;

The print-out of dataset FLOAT shows that with round() function, flag variable is set correctly; but we get erratic result sans rounding. This is because in SAS, numeric values are represented as 64-bit floating point numbers, and rules of algebra may not apply to floating point numbers. A paper from the SAS® Institute explains this phenomena in great details. You can check it out yourself.

                                       Without      With
Obs    x     y     x - y    cutoff    rounding    rounding

 1     2    1.1      0.9      0.9         0           1
 2     2    1.2      0.8      0.8         1           1
 3     2    1.3      0.7      0.7         1           1
 4     2    1.4      0.6      0.6         0           1
 5     2    1.5      0.5      0.5         1           1
 6     2    1.6      0.4      0.4         0           1
 7     2    1.7      0.3      0.3         0           1
 8     2    1.8      0.2      0.2         0           1
 9     2    1.9      0.1      0.1         0           1

And as you see from the above example, to circumvent this problem, the quick and dirty way is using round() function to set precision before comparison.

And upon further checking, this is also an issue in R. You can see that, without rounding function, values for some of the comparisons are not returning “TRUE” even though on paper (x-y) and z might look the same.

> x <- rep(2, 9)
> y <- seq(1.1, 1.9, by=0.1)
> z <- seq(0.9, 0.1, by=-0.1)
> x-y
[1] 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1
> z
[1] 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1
> # Without rounding
> x-y == z
[1] FALSE FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE FALSE
>
> # With rounding
> round(x-y, 1) == round(z, 1)
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

So the take home lesson here is always check your calculation when you are dealing with values with decimal points, use various rounding and truncation functions to ensure decimal precision. And do remember, even though your number/data is continuous, computer only recognizes 0 and 1.

Advertisements

Written by sasandr

July 6, 2012 at 2:26 pm

Posted in R, SAS

Tagged with , ,

4 Responses

Subscribe to comments with RSS.

  1. 我最近碰到过同样情况,也是抓破头都想不出来哪儿出错了,主要是不知道什么时候会出现这个问题。

    xiaogao

    July 9, 2012 at 11:03 pm

  2. I like the way you post code. How do you post data, and code on your wordpress blog?

    HG

    July 10, 2012 at 10:42 am


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: