SAS® and R

Best of Both Worlds

Freakonomics survey

with one comment

Freakonomics just put up a survey: Which social science should die?

The four candidates for elimination are: Psychology, Political science, Economics, and Sociology. If you ask me what are these four disciplines in common, on top of my list would be they all tend to misuse statistical test of significance, with sociology and political science leading the charge. I’d be interested to know the survey result. However, I think for the survey to be meanful in any discussion, they should take the survey taker’s profession into consideration.

Please go take the survey, the result is out next week.

Advertisements

Written by sasandr

July 5, 2012 at 11:05 am

Posted in Misc, Stat

Tagged with

Djokovic, Federer drawn to meet in Wimbledon semis

with 2 comments

I thought I’d update my old post about the men’s (in Wimbledon’s case, gentlemen’s) draw since now we have one more data point. The Wimbledon draw is out – as Associate Press puts it – “Random as Grand Slam tournament draws are meant to be, Novak Djokovic and Roger Federer keep bumping into each other in major semifinals, and it could happen again at Wimbledon.” Could it be anything but random?

Draw data compiled from Wikipedia.

So Federer and Djokovic somehow always end up in the same half – 19 times out of 27 draws in the past seven years. Statistically speaking, what is the probability of 19 times or more being in the same half out of a total of 27? How about less than 3%?

Written by sasandr

June 22, 2012 at 10:59 pm

Posted in Stat

Tagged with ,

What makes you smart can also make you stupid

with 5 comments

Interesting article from New Yorker. [Read more…]

Written by sasandr

June 12, 2012 at 2:12 pm

Posted in Misc

Tagged with

SAS tip: Save and load system options

leave a comment »

It is good statistical programming practice to delete all temporary data sets at the end of a macro run, not just to save considerable work space and memory, but also reduce the chance for errors such as reusing the data set from previous run, or warning signs such as naming conflict.

But how about system options? In your macro call, you might need to change several system option settings, and it would be a hassle to reset them back to the original state one-by-one. Thankfully we have OPTSAVE and OPTLOAD procedures at our disposal, which can be used to save SAS option values, and restore them at a later time. In the following example the OPTSAVE procedure writes the values of all the SAS options that can be altered from within a SAS session to a SAS data set [your_saved_options]. The OPTLOAD procedure later restores the SAS session option values from the [your_saved_options] data set.

/* Save your SAS system options */
proc optsave out=[your_saved_options];
run;

/* Reload your SAS system options */
proc optload data=[your_saved_dateset];
run;

This way, SAS options can be reset back to the original values, before exiting the macro.

Written by sasandr

June 6, 2012 at 4:08 pm

Posted in SAS

Tagged with

China stocks fall bizarre 64.89 points on June 4, ’89 anniversary

leave a comment »

What are the chances?

CNBC screenshot

China stocks fall bizarre 64.89 points on June 4th, ’89 anniversary

From Reuters:

BEIJING (Reuters) – China’s censors blocked access to the term “Shanghai stock market” on popular microblogs on Monday after the index fell a bizarre 64.89 points on the anniversary of the bloody June 4, 1989, crackdown on pro-democracy protesters in Tiananmen Square. [Read more…]

Written by sasandr

June 4, 2012 at 10:05 am

Posted in Misc

Tagged with

Roland Garros 2012

with 2 comments

French Open draw is out. What’s new? Not much. You get the same old, same old Nadal/Murray vs. Djokovic/Federer set up. So what’s the odds of Djokovic and Federer always in the same half? That’s worth looking into.

For those who are not familiar with the draw process, in Grand Slam tennis, there are 128 players in the main draw. After the seeding is decided, the top 2 players will be placed in each half of the draw, ensuring that best players only meet in the final. Since Federer and Djokovic was never seeded top two in Grand Slam tournaments, theoretically speaking, Djokovic has 50% of the chance to be drawn in Federer’s half of the draw (Federer vice versa), and this seems to be the case in 2006 and most of 2007, before Djokovic ascended in ranking to #4. After Djokovic became a consistent presence in the top 4, therefore a credible contender to the Grand Slam titles, curiously enough he appeared in Federer’s half most of the times. As you can see in the graph, during the 4-year-span from 2008 to 2011, only two times out of a total of 16 they were in different half. And the proportion gets even higher for the so-called fast courts (12 out of 12). And it seems only on clay court (French Open), the two has a more even chance of facing each other in the final.

Draw data compiled from Wikipedia

More astonishingly, since 2008 Wimbledon, they were drawn in the same half for 7 times consecutively. So what’s the probability of that? It’s difficult to calculate that since all these events are not independent. But I did a very crude simulation based on the binomial distribution. It seems the chance of being in the same side of the draw 18 out of 26 times is quite rare, as the case of 7 times in a row (< 5%) being in the same half. So this, combined with a study conducted by ESPN, conspiracy-minded tennis buff would like to know if indeed the draw is fixed to make the game more exciting, or this is really nothing more than the occurrence of small probability events.

Written by sasandr

May 28, 2012 at 11:39 pm

Posted in Stat

Tagged with ,

All those CATs

with one comment

The title is a bit misleading since we are not talking about the furry animals here. This post is about the conCATenation functions in SAS, which are useful, but CAN get hairy at times.

The word concatenate comes directly from Latin concatenare, which in turn is formed from “con-,” meaning “with” or “together,” and “catena,” meaning “chain.” And the CATx functions, in short, “chain” words or numbers together to make a new string. The different CATx functions dictate how you want to chain them together.

There are five concatenate functions in SAS (CAT, CATS, CATT, CATX, CATQ).  The main difference among these functions involves the handling of leading/trailing blanks as well as separator characters between the concatenated items. People from SAS old school might still prefer traditional concatenation operator (||), but the new functions just take far less work to accomplish the same task. The following output gives an overview of the difference between the CATx functions. (click the graph to enlarge)

As shown below, we have four strings (brackets on both sides to show the leading/trailing blanks), and you can see the resulting strings using the different CATx functions (period at the end to show where the string ends). One advantage of using the CATx functions is the items to be concatenated may be character or numeric. And, if you include numeric values, they are treated as if they were actually character values and no numeric-to-character conversion messages are printed to the SAS log.

  All Those CATx, MEOW!
 
String 1 has no leading/trailing blank            [ALL]
String 2 has three trailing blanks                [THOSE   ]
String 3 has two leading blanks                   [  CATS]
String 4 has three leading and two trailing blanks[   MEOW  ]
 
 
- CAT acts like '||' with minor difference.¹
cat(String1,String2,String3,String4) shows:       ALLTHOSE     CATS   MEOW  .
 
- CATS Removes trailing and leading blanks.
cats(String1,String2,String3,String4):            ALLTHOSECATSMEOW.
cats(12,34,56,78)                                 12345678.
 
- CATT only trims trailing blanks.
catt(String1,String2,String3,String4)             ALLTHOSE  CATS   MEOW.
 
- CATX trims both leading and trailing blanks, and inserts separator character.
catx(' ',String1,String2,String3,String4)         ALL THOSE CATS MEOW.
catx(',',String1,String2,String3,String4)         ALL,THOSE,CATS,MEOW.
catx('-',908,782,6562)                            908-782-6562.
 
- CATQ joins strings together as defined by the modifier.²
catq(' ',String1,String2,String3,String4)         ALL "THOSE   " "  CATS" "   MEOW  ".
catq('a',String1,String2,String3,String4)         "ALL" "THOSE   " "  CATS" "   MEOW  ".
catq('s',String1,String2,String3,String4)         ALL THOSE CATS MEOW.
catq('as',String1,String2,String3,String4)        "ALL" "THOSE" "CATS" "MEOW".
catq('asd','~~', String1,String2,String3,String4) "ALL"~~"THOSE"~~"CATS"~~"MEOW".
 
¹ The default length of the result when you use the || operator is the sum of
  the lengths of the strings being concatenated, the default length of the
  result when you use the CAT function is 200.
² Check SAS Language Reference for details on all the available modifiers.
 
Remember to define string length before calling CATx,
otherwise the default length of the resulting string is 200.

For a complete list of CATQ modifiers, go to CATQ function language reference.

Please note that it is always a good practice to initialize your character variable by specifying the length of the resulting variable directly under the data statement, and making sure to set the length of created variables long enough to accommodate the longest string created by concatenation. If not, the resulting string will be truncated, and you will see an error message in the log.

Written by sasandr

May 22, 2012 at 10:55 am

Posted in SAS

Tagged with