I just thought I’d do a quick post about SAS macro variables. They’re a pretty important part of SAS but when I had just started people told me not to worry about them and to just enter values by hand. After I finally got tired of constantly looking up results and entering them into programs over and over again, I decided I had better figure out how real programmers did it. As a side note, I’ve had a couple people tell me SAS was dying out but I see it appears to be in the top 20 of programming languages so I guess it’s going strong.
So SAS is very good at working on rows of data but not quite so good at working with columns. For example if you have a column of tree heights and want to standardize by the mean and standard deviation, there is no easy way to do this without using macros or manually getting the mean and entering it into your program (this works the first couple times but it gets old quickly trust me).
Macros seem to me to be sort of a patch that sits on top of the main SAS program to allow this sort of thing. Macros work by scanning through your program for special commands and replacing things before sending it onto the real SAS program. In effect, the macro processor writes your program for you.
So back to our example of trees. Here is some data:
[sas]data trees; input name $1-8 height 9-11; cards; Maple 123 Oak 78 Birch 90 Elm 155 Poplar 65 run;[/sas]I never actually use the cards
statements in real code but it’s handy for portable examples. Also the spaces between the two columns are significant since we told SAS to look the 1st through 8th column ($1-8
) for tree name
So we now need to find the mean and s.d. of the heights:
[sas]proc means data=trees mean std; var height; output out=meansd mean=meanheight std=sdheight; run; [/sas]Now our mean is stored in the meanheight
and sdheight
columns of the meantrees
dataset. So to get it into the macro part of SAS we need to use call symput(anyname,column)
. You can only use call symput
in data statements. This is a little silly since we don’t actually want to do anything with the data but I don’t make the rules. Luckily, SAS does provide the _NULL_
dataset which just dumps whatever you put in it after you finish the data step. SAS variables often have extra spaces attached so it’s probably smart to stick a trim(left(column))
in the call symput
.
So after call symput
we can access the stored values with &treemean
and &treesd
. They’re called macro variables by the way. If you ever need to check the contents of your macro variables you can use %PUT _users_;
to put them all into the log file. So now we can make a column of our standardized tree heights:
Now the trees
dataset contains the standardized tree heights. This is a lot easier than looking in the results and entering it manually every time something changes.
Damwal | 19-Apr-09 at 2:26 pm | Permalink
Hi! I really like your solution – thanks to it I’ve learned some new things about macro variables.
Another way you can deal with this type of operations is to add statistics to your main dataset and then make all the operations on variables. Please find the example below
data trees;
if _N_=1 then set meansd(keep=meanheight sdheight);
set trees;
standardized=(height-meanheight)/sdheight;
run;
MS | 09-Apr-10 at 10:29 am | Permalink
How would you do this if you have more than one variable that you want to standardize, for example, age of tree? It seems that symput would only take the value of the variable for the first observation in the meansd table.
thanks.
mw | 19-Dec-12 at 8:30 am | Permalink
why go through all this trouble?
there is proc standard…
and it allow by clause…