I’ve been meaning to write up a bit on using macros in SAS to complement my previous post on macro variables for quite a while. Luckily Norwegian guy reminded me about the pain of starting programming in SAS and provided me some motivation. So here’s my take on using macros in programming.
So what is a macro? Macros are a part of SAS that look through your code before the normal part of SAS sees it and writes out your code for you based on a special syntax. If you’ve ever found yourself copying and pasting code then you’ve probably been in a situation well suited for macros. They’re also great if you need to perform different functions under different conditions. Once I learned macros, SAS seemed a lot more like a usable (although weird) programming language and tasks seemed to get a lot easier (except actually picking the statistical techniques to use).
Probably the easiest way to see what macros do is an example. So say we once again have a data set of tree heights
[sas]data trees; input name:$8. height; cards; Maple 123 Maple 78 Maple 90 Elm 155 Elm 65 Elm 90 Elm 120 Birch 100 Birch 30 Maple 111 run;[/sas]I already talked about how to find and use the mean and standard deviation for the whole data set. Now what if we wanted to standardize each species by its own seperate mean and deviation? We could cut and paste but once we get a few more species or want to change something later this really becomes a hassle. So this is where macros come in.
The first thing to do is to calculate the mean and standard deviations for each species. We can use proc means
again to do this. Since we won’t be using the output I’ll add the noprint
option and since we only want the means for the individual species and not the whole dataset I’ll add the nway
option. The class name;
statement tells SAS to find the statistics seperately for each species and the output
line tells SAS to save the mean and deviation in a dataset called meansd
.
Now we just need to get the values from the meansd
dataset into macro variables. We’ll use the _NULL_ dataset and call symput
again to create macro variables. This time we need to create seperate macro variables for each species. Luckily SAS automatically numbers each observation in a dataset in a column called _N_
. Since each line of the dataset corresponds to a tree species, we can easily use this identifier to create the macro variables by using call symput('mean'||left(_N_), meanheight);
. The left()
and ()trim
functions (numeric variables have extra spaces to the left and string variables have spaces to the right) removes any unnecessary spaces and the ||
concatenates (connects) the text “mean” with the line number to give give mean1
, mean2
, etc.. I’ll do the same thing for standard deviation and tree name. Once the macro variables are created, there is still one problem remaining. We don’t know how many species there were or how many macro variables were created. Luckily SAS will make another column that indicates the last line of the dataset when it sees end=newcolumnname
following a set statement. Then we just need to check if SAS is on the last line and if so save the line number (_N_
) to know the number of species of trees.
If you ever want to check what macro variables you have in your program, you can use %PUT _USER_;
to print them all to the log file. Or if you want to see every macro variable available (SAS has quite a few automatic ones like operating system and date) use %PUT _ALL_;
. Inserting %PUT _USER_;
here produces:
Now we’ve set a lot of macro variables but we still haven’t created a real macro. In SAS, macros are started with %MACRO macroname;
and finished with %MEND;
(short for M[acro]END). %
‘s are used to indicate commands that the SAS macro facility will read and remove before normal SAS sees the code. Anything not with a % will be printed out by the macro facility. Macros don’t spit out their code for SAS until they’re are called using %macroname
.
So I’ll call my macro treestandardizer
but you can call it whatever you want. I’m going to use a pretty simple and specific macro but if you were going to use this often and for different datasets you would want to program it better. The first thing to do is create the final
dataset and set it to the trees
dataset. Since we need to loop through each species of tree, we’ll need a %DO
loop. Everything between %DO
and %END
will be repeated while i
increments from 1 to the number of tree species. If you want to combine text and a macro variable to reference another macro variable, you use the double ampersand &&
in SAS. For example, we want to get the mean for species 1 by looking in the macro variable &mean1
so we use &&mean&i
. I think the macro processing part of SAS ends up running through the code twice, the first time finding the &&
and replacing it with &
and the &i
and replacing it with 1
to leave &mean1
and the second time finding &mean1
and pasting in the appropriate value (65). So we’ll have the do loop write out a series of if
statements to check what the name of the tree is and use the appropriate mean and deviation. Note that when using a string macro variable like &nameX
, you need to surround it with double quotes (the macro processor doesn’t look inside single quotes) so SAS doesn’t think it is a variable name.
The previous code prepared the macro but nothing actually happens until we call it using %treestandardizer
. Unlike almost everything else in SAS this line doesn’t have to end in a semicolon (although it’s pretty unlikely to hurt if you forget and add one). So to call the macro:
If you want to see what happens when you call a macro, you can have SAS print the code generated by the macro to the log file with the option option mprint;
(make sure to set it before actually calling the macro). In this case, it gives:
So it worked and we now have the standardized heights in the stheight
column of the final
dataset. This particular example could be done a few different ways (the easiest and probably better way being to merge the meancv
dataset with the trees
) but I hope it gives a decent introduction to SAS macros. If you have any specific questions or something wasn’t clear, feel free to ask in a comment.
Here is the SAS source code if you don’t feel like copying and pasting.
kishore kumar | 22-May-08 at 8:02 am | Permalink
what is the ‘&&’ use in sas macros
ScottS-M | 22-May-08 at 10:36 am | Permalink
The
&&
is sort of a funny SAS way of doing arrays or pointers. When SAS runs into a&&
it turns it into a single&
and remembers it needs to look through the programs for&
‘s a second time.It’s probably easier to show by example. Let’s say you have a bunch of names stored in the macro variables
&name1
,&name2
… and you want to get the name from the second one, so you’re looking for&name2
. To get it, you set%LET i = 2;
(in reality you’d probably be looping through or using an if statement but same idea). So you might try&name&i
but if you do that SAS will try to fill in both&name
and&i
at the same time and not give you want you want. However if you do&&name&i
, SAS changes&&
to&
and&i
to2
, resulting in&name2
. Then SAS runs through the code again, sees&name2
and fills it in with the desired second name.Let me know, if that wasn’t clear it’s sort of a complex odd thing.
Justin | 14-May-09 at 4:08 pm | Permalink
For a long time I have lamented the fact that you can’t store a variable outside of a data set, but here you show that it’s very easy. Thanks for the blurb.
Also, in proc means you can use the autoname and autolabel options to the output statement to have SAS automatically create new variable names, e.g.
output out=meansd mean= std= /autoname autolabel;
The syntax is strange, but most of SAS is strange.
In your example it’s no big deal, but if you wanted means and standard deviations for 10 variables, it makes things easier.
Anil Kumar | 20-May-09 at 5:36 am | Permalink
How to stop the looping Process if an error occurs using %syserr.Suppose i have a loop in a macro which execute 10 times,If an error occurs in the first loop then the process need to be stoped and it should not go for the next loop.Please suggest me how to handle the error.
ScottS-M | 20-May-09 at 11:41 am | Permalink
@anil kumar
You could probably use a %goto statement:
%MACRO test;
%DO i=1 %TO 10;
%PUT Working on &i;
%IF &i = 5 %THEN %DO
%PUT Found an error;
%GOTO ERROR;
%END;
%END;
%ERROR:
%MEND;
Haes | 28-Oct-09 at 10:26 pm | Permalink
Hey there. Good tutorial. Just a point of note (which beffudled me for a second), you forgot the ‘run;’ statement at the end of your macro.
ScottS-M | 29-Oct-09 at 11:18 am | Permalink
@Haes
That I did. Thanks. I’ll stick it in there now.
Toby | 13-Nov-09 at 7:19 pm | Permalink
Seems to me you have made a mountain out of a mole hill with the example. While there are many uses of the macro language in SAS it is also over used by far too many to solve problems that are better solved in simplier manners.
If I assume your same starting data set here is a two step solution:
Proc Means
Data = Trees NWay NoPrint ;
Class Name ;
Var Height ;
Output Out = MeanSTD ( Drop = _Type_ _Freq_ )
Mean = Mean STD = STD ;
Run ;
Proc SQL ;
Create Table Final As
Select Trees.Name , Height , Mean , STD ,
( ( Height – Mean ) / STD ) As StHeight
From Trees
Left Join
MeanSTD
On Trees.Name = MeanSTD.Name
Order By Name ;
Quit ;
SAS is a bunch of individual languages held together by data sets. The macro language is nothing more than a way to create “Open Code in Mass Quantity”. Which if used right can be a life saver or finger cramp saver (which ever you like). However, it can just as easily and more often than not is, a problem maker that stunts the growth of programmer in learning the SAS language.
ScottS-M | 13-Nov-09 at 7:43 pm | Permalink
@Toby
That sounds pretty identical to what I wrote:
zc | 24-Dec-10 at 1:29 pm | Permalink
Very rewarding in reading the interesting Macro example and stimulating comments.
I have an application question not sure whether your guys know the sure answer:
How to obtain the adjusted differences with 95% confidence intervals in original scale for the dependent variables that need to be log-transformed before you do the GLM for a dichotomous predictor variable? We often need to use the GLM model to account for variations of multiple co-variables
(categorical or continuous))? If the data are in original scale, this is very simple as these are the regression coefficients and 95% CIs from GLM. However, sometimes, the dependent variable is highly skewed, and violate the assumption of Normality, therefore, data log-transformation is needed before you do the GLM. In such cases, the regression coefficients you obtained are effectively the ratio of the two adjusted means in the original scale, so you can not obtain the confidence limits of the adjusted difference directly.
Srinath | 29-Dec-10 at 4:30 am | Permalink
use only one parameter and find the total of two numbers.
ex: x=2 and y=3 find sum with only one parameter.
Matt | 28-Oct-11 at 10:51 am | Permalink
Thanks for the quick macro tutorial. Little pieces of information such as the fact that there is a column created named _N_ is massively useful. SAS help guides sometimes leave these kind of details out and it frustrates me beyond belief. Simple example, but this will allow me to branch off into more complex code. Thanks again.
gaurav | 30-Dec-13 at 8:53 am | Permalink
use only one parameter and find the total of two numbers.
ex: x=2 and y=3 find sum with only one parameter.