SAS Macro Variables: How to Take a Mean in SAS

I just thought I'd do a quick post about SAS macro variables. They're a pretty important part of SAS but when I had just started people told me not to worry about them and to just enter values by hand. After I finally got tired of constantly looking up results and entering them into programs over and over again, I decided I had better figure out how real programmers did it. As a side note, I've had a couple people tell me SAS was dying out but I see it appears to be in the top 20 of programming languages so I guess it's going strong.

So SAS is very good at working on rows of data but not quite so good at working with columns. For example if you have a column of tree heights and want to standardize by the mean and standard deviation, there is no easy way to do this without using macros or manually getting the mean and entering it into your program (this works the first couple times but it gets old quickly trust me).

Macros seem to me to be sort of a patch that sits on top of the main SAS program to allow this sort of thing. Macros work by scanning through your program for special commands and replacing things before sending it onto the real SAS program. In effect, the macro processor writes your program for you.

So back to our example of trees. Here is some data:

SAS:
  1. data trees;
  2. input name $1-8 height 9-11;
  3. cards;
  4. Maple   123
  5. Oak     78
  6. Birch   90
  7. Elm     155
  8. Poplar  65
  9. run;

I never actually use the cards statements in real code but it's handy for portable examples. Also the spaces between the two columns are significant since we told SAS to look the 1st through 8th column ($1-8) for tree name

So we now need to find the mean and s.d. of the heights:

SAS:
  1. proc means data=trees mean std;
  2. var height;
  3. output out=meansd mean=meanheight std=sdheight;
  4. run;

Now our mean is stored in the meanheight and sdheight columns of the meantrees dataset. So to get it into the macro part of SAS we need to use call symput(anyname,column). You can only use call symput in data statements. This is a little silly since we don't actually want to do anything with the data but I don't make the rules. Luckily, SAS does provide the _NULL_ dataset which just dumps whatever you put in it after you finish the data step. SAS variables often have extra spaces attached so it's probably smart to stick a trim(left(column)) in the call symput.

SAS:
  1. data _null_;
  2. set meansd;
  3. call symput('treemean',trim(left(meanheight)));
  4. call symput('treesd',trim(left(sdheight)));
  5. run;

So after call symput we can access the stored values with &treemean and &treesd. They're called macro variables by the way. If you ever need to check the contents of your macro variables you can use %PUT _users_; to put them all into the log file. So now we can make a column of our standardized tree heights:

SAS:
  1. data trees;
  2. set trees;
  3. standardized=(height-&treemean)/&treesd;
  4. run;

Now the trees dataset contains the standardized tree heights. This is a lot easier than looking in the results and entering it manually every time something changes.