SAS Macros: Letting SAS Do the Typing

I've been meaning to write up a bit on using macros in SAS to complement my previous post on macro variables for quite a while. Luckily Norwegian guy reminded me about the pain of starting programming in SAS and provided me some motivation. So here's my take on using macros in programming.

So what is a macro? Macros are a part of SAS that look through your code before the normal part of SAS sees it and writes out your code for you based on a special syntax. If you've ever found yourself copying and pasting code then you've probably been in a situation well suited for macros. They're also great if you need to perform different functions under different conditions. Once I learned macros, SAS seemed a lot more like a usable (although weird) programming language and tasks seemed to get a lot easier (except actually picking the statistical techniques to use).

Probably the easiest way to see what macros do is an example. So say we once again have a data set of tree heights

SAS:
  1. data trees;
  2. input name:$8. height;
  3. cards;
  4. Maple 123
  5. Maple 78
  6. Maple 90
  7. Elm 155
  8. Elm 65
  9. Elm 90
  10. Elm 120
  11. Birch 100
  12. Birch 30
  13. Maple 111
  14. run;

I already talked about how to find and use the mean and standard deviation for the whole data set. Now what if we wanted to standardize each species by its own seperate mean and deviation? We could cut and paste but once we get a few more species or want to change something later this really becomes a hassle. So this is where macros come in.

The first thing to do is to calculate the mean and standard deviations for each species. We can use proc means again to do this. Since we won't be using the output I'll add the noprint option and since we only want the means for the individual species and not the whole dataset I'll add the nway option. The class name; statement tells SAS to find the statistics seperately for each species and the output line tells SAS to save the mean and deviation in a dataset called meansd.

SAS:
  1. proc means data=trees nway noprint;
  2. class name;
  3. var height;
  4. output out=meansd mean=meanheight std=sdheight;
  5. run;

Now we just need to get the values from the meansd dataset into macro variables. We'll use the _NULL_ dataset and call symput again to create macro variables. This time we need to create seperate macro variables for each species. Luckily SAS automatically numbers each observation in a dataset in a column called _N_. Since each line of the dataset corresponds to a tree species, we can easily use this identifier to create the macro variables by using call symput('mean'||left(_N_), meanheight);. The left() and ()trim functions (numeric variables have extra spaces to the left and string variables have spaces to the right) removes any unnecessary spaces and the || concatenates (connects) the text "mean" with the line number to give give mean1, mean2, etc.. I'll do the same thing for standard deviation and tree name. Once the macro variables are created, there is still one problem remaining. We don't know how many species there were or how many macro variables were created. Luckily SAS will make another column that indicates the last line of the dataset when it sees end=newcolumnname following a set statement. Then we just need to check if SAS is on the last line and if so save the line number (_N_) to know the number of species of trees.

SAS:
  1. data _NULL_;
  2. set meansd end=last;
  3. call symput('mean'||left(_N_),meanheight);
  4. call symput('sd'||left(_N_),sdheight);
  5. call symput('name'||left(_N_),trim(name));
  6. if last then call symput('numspecies',_N_);
  7. run;

If you ever want to check what macro variables you have in your program, you can use %PUT _USER_; to print them all to the log file. Or if you want to see every macro variable available (SAS has quite a few automatic ones like operating system and date) use %PUT _ALL_;. Inserting %PUT _USER_; here produces:

SAS:
  1. GLOBAL NUMSPECIES            3
  2. GLOBAL NAME1 Birch
  3. GLOBAL NAME2 Elm
  4. GLOBAL NAME3 Maple
  5. GLOBAL MEAN1           65
  6. GLOBAL MEAN2        107.5
  7. GLOBAL MEAN3        100.5
  8. GLOBAL SD1 49.497474683
  9. GLOBAL SD2 38.837267326
  10. GLOBAL SD3 20.273134933

Now we've set a lot of macro variables but we still haven't created a real macro. In SAS, macros are started with %MACRO macroname; and finished with %MEND; (short for M[acro]END). %'s are used to indicate commands that the SAS macro facility will read and remove before normal SAS sees the code. Anything not with a % will be printed out by the macro facility. Macros don't spit out their code for SAS until they're are called using %macroname.

So I'll call my macro treestandardizer but you can call it whatever you want. I'm going to use a pretty simple and specific macro but if you were going to use this often and for different datasets you would want to program it better. The first thing to do is create the final dataset and set it to the trees dataset. Since we need to loop through each species of tree, we'll need a %DO loop. Everything between %DO and %END will be repeated while i increments from 1 to the number of tree species. If you want to combine text and a macro variable to reference another macro variable, you use the double ampersand && in SAS. For example, we want to get the mean for species 1 by looking in the macro variable &mean1 so we use &&mean&i. I think the macro processing part of SAS ends up running through the code twice, the first time finding the && and replacing it with & and the &i and replacing it with 1 to leave &mean1 and the second time finding &mean1 and pasting in the appropriate value (65). So we'll have the do loop write out a series of if statements to check what the name of the tree is and use the appropriate mean and deviation. Note that when using a string macro variable like &nameX, you need to surround it with double quotes (the macro processor doesn't look inside single quotes) so SAS doesn't think it is a variable name.

SAS:
  1. %MACRO treestandardizer;
  2. data final;
  3. set trees;
  4. %DO i = 1 %TO &numspecies;
  5. if name="&&name&i" then stheight=(height-&&mean&i)/&&sd&i;
  6. %END;
  7. run;
  8. %MEND;

The previous code prepared the macro but nothing actually happens until we call it using %treestandardizer. Unlike almost everything else in SAS this line doesn't have to end in a semicolon (although it's pretty unlikely to hurt if you forget and add one). So to call the macro:

SAS:
  1. %treestandardizer

If you want to see what happens when you call a macro, you can have SAS print the code generated by the macro to the log file with the option option mprint; (make sure to set it before actually calling the macro). In this case, it gives:

SAS:
  1. MPRINT(TREESTANDARDIZER):   data final;
  2. MPRINT(TREESTANDARDIZER):   set trees;
  3. MPRINT(TREESTANDARDIZER):   if name="Birch" then stheight=(height- 65)/49.497474683;
  4. MPRINT(TREESTANDARDIZER):   if name="Elm" then stheight=(height- 107.5)/38.837267326;
  5. MPRINT(TREESTANDARDIZER):   if name="Maple" then stheight=(height- 100.5)/20.273134933;

So it worked and we now have the standardized heights in the stheight column of the final dataset. This particular example could be done a few different ways (the easiest and probably better way being to merge the meancv dataset with the trees) but I hope it gives a decent introduction to SAS macros. If you have any specific questions or something wasn't clear, feel free to ask in a comment.

Here is the SAS source code if you don't feel like copying and pasting.