<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Dammit Jim! &#187; statistics</title>
	<atom:link href="http://scott.sherrillmix.com/blog/tag/statistics/feed/" rel="self" type="application/rss+xml" />
	<link>http://scott.sherrillmix.com/blog</link>
	<description>I'm a biologist not a...</description>
	<lastBuildDate>Mon, 06 Feb 2012 05:19:08 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3</generator>
		<item>
		<title>SAS Macros: Letting SAS Do the Typing</title>
		<link>http://scott.sherrillmix.com/blog/programmer/sas-macros-letting-sas-do-the-typing/</link>
		<comments>http://scott.sherrillmix.com/blog/programmer/sas-macros-letting-sas-do-the-typing/#comments</comments>
		<pubDate>Sun, 04 Nov 2007 08:09:31 +0000</pubDate>
		<dc:creator>ScottS-M</dc:creator>
				<category><![CDATA[Programmer]]></category>
		<category><![CDATA[SAS]]></category>
		<category><![CDATA[Statistician]]></category>
		<category><![CDATA[ampersand]]></category>
		<category><![CDATA[arrays]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[do loop]]></category>
		<category><![CDATA[macro]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[std]]></category>
		<category><![CDATA[syput]]></category>
		<category><![CDATA[tutorial]]></category>
		<category><![CDATA[variable]]></category>

		<guid isPermaLink="false">http://scott.sherrillmix.com/blog/programmer/sas-macros-letting-sas-do-the-typing/</guid>
		<description><![CDATA[I've been meaning to write up a bit on using macros in SAS to complement my previous post on macro variables for quite a while. Luckily Norwegian guy reminded me about the pain of starting programming in SAS and provided me some motivation. So here's my take on using macros in programming. So what is [...]]]></description>
			<content:encoded><![CDATA[<p>I've been meaning to write up a bit on using macros in SAS to complement my previous post on <a href="http://scott.sherrillmix.com/blog/programmer/sas-macros/">macro variables</a> for quite a while. Luckily <a href="http://scott.sherrillmix.com/blog/programmer/sas-lag-problems/#comment-14002">Norwegian guy</a> reminded me about the pain of starting programming in SAS and provided me some motivation. So here's my take on using macros in programming.</p>

<p>So what is a macro? Macros are a part of SAS that look through your code before the normal part of SAS sees it and writes out your code for you based on a special syntax. If you've ever found yourself copying and pasting code then you've probably been in a situation well suited for macros. They're also great if you need to perform different functions under different conditions. Once I learned macros, SAS seemed a lot more like a usable (although weird) programming language and tasks seemed to get a lot easier (except actually picking the statistical techniques to use).</p>

<p>Probably the easiest way to see what macros do is an example. So say we once again have a data set of tree heights</p>

<div class="syntax_hilite"><span class="langName">SAS:</span><br /><div id="sas-8">
<div class="sas"><ol><li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #000080; font-weight: bold;">data</span> trees;</div></li>
<li style="font-weight: bold;color:#26536A;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #0000ff;">input</span> name:$<span style="color: #2e8b57; font-weight: bold;color:#800000;">8</span>. height;</div></li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">cards;</div></li>
<li style="font-weight: bold;color:#26536A;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">Maple <span style="color: #2e8b57; font-weight: bold;color:#800000;">123</span></div></li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">Maple <span style="color: #2e8b57; font-weight: bold;color:#800000;">78</span></div></li>
<li style="font-weight: bold;color:#26536A;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">Maple <span style="color: #2e8b57; font-weight: bold;color:#800000;">90</span></div></li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">Elm <span style="color: #2e8b57; font-weight: bold;color:#800000;">155</span></div></li>
<li style="font-weight: bold;color:#26536A;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">Elm <span style="color: #2e8b57; font-weight: bold;color:#800000;">65</span></div></li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">Elm <span style="color: #2e8b57; font-weight: bold;color:#800000;">90</span></div></li>
<li style="font-weight: bold;color:#26536A;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">Elm <span style="color: #2e8b57; font-weight: bold;color:#800000;">120</span></div></li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">Birch <span style="color: #2e8b57; font-weight: bold;color:#800000;">100</span></div></li>
<li style="font-weight: bold;color:#26536A;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">Birch <span style="color: #2e8b57; font-weight: bold;color:#800000;">30</span></div></li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">Maple <span style="color: #2e8b57; font-weight: bold;color:#800000;">111</span></div></li>
<li style="font-weight: bold;color:#26536A;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #000080; font-weight: bold;">run</span>; </div></li></ol></div>
</div></div><br />

<p>I already talked about how to find and use the mean and standard deviation for the <a href="http://scott.sherrillmix.com/blog/programmer/sas-macros/"> whole data set</a>. Now what if we wanted to standardize each species by its own seperate mean and deviation? We could cut and paste but once we get a few more species or want to change something later this really becomes a hassle. So this is where macros come in.</p>

<p>The first thing to do is to calculate the mean and standard deviations for each species. We can use <code>proc means</code> again to do this. Since we won't be using the output I'll add the <code>noprint</code> option and since we only want the means for the individual species and not the whole dataset I'll add the <code>nway</code> option. The <code>class name;</code> statement tells SAS to find the statistics seperately for each species and the <code>output</code> line tells SAS to save the mean and deviation in a dataset called <code>meansd</code>.</p>
<div class="syntax_hilite"><span class="langName">SAS:</span><br /><div id="sas-9">
<div class="sas"><ol><li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #000080; font-weight: bold;">proc means</span> <span style="color: #000080; font-weight: bold;">data</span>=trees nway noprint;</div></li>
<li style="font-weight: bold;color:#26536A;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">class name;</div></li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #0000ff;">var</span> height;</div></li>
<li style="font-weight: bold;color:#26536A;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #0000ff;">output</span> out=meansd <span style="color: #0000ff;">mean</span>=meanheight <span style="color: #0000ff;">std</span>=sdheight;</div></li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #000080; font-weight: bold;">run</span>; </div></li></ol></div>
</div></div><br />

<p>Now we just need to get the values from the <code>meansd</code> dataset into macro variables. We'll use the _NULL_ dataset and <code>call symput</code> again to create macro variables. This time we need to create seperate macro variables for each species. Luckily SAS automatically numbers each observation in a dataset in a column called <code>_N_</code>. Since each line of the dataset corresponds to a tree species, we can easily use this identifier to create the macro variables by using <code>call symput(&#039;mean&#039;||left(_N_), meanheight);</code>. The <code>left()</code> and <code>()trim</code> functions (numeric variables have extra spaces to the left and string variables have spaces to the right) removes any unnecessary spaces and the <code>||</code> concatenates (connects) the text "mean" with the line number to give give <code>mean1</code>, <code>mean2</code>, etc.. I'll do the same thing for standard deviation and tree name. Once the macro variables are created, there is still one problem remaining. We don't know how many species there were or how many macro variables were created. Luckily SAS will make another column that indicates the last line of the dataset when it sees <code>end=newcolumnname</code> following a set statement. Then we just need to check if SAS is on the last line and if so save the line number (<code>_N_</code>) to know the number of species of trees.</p>  

<div class="syntax_hilite"><span class="langName">SAS:</span><br /><div id="sas-10">
<div class="sas"><ol><li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #000080; font-weight: bold;">data</span> <span style="color: #0000ff;">_NULL_</span>;</div></li>
<li style="font-weight: bold;color:#26536A;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #0000ff;">set</span> meansd <span style="color: #0000ff;">end</span>=last;</div></li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #0000ff;">call</span> symput<span style="color: #66cc66;">&#40;</span><span style="color: #a020f0;">'mean'</span>||left<span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">_N_</span><span style="color: #66cc66;">&#41;</span>,meanheight<span style="color: #66cc66;">&#41;</span>;</div></li>
<li style="font-weight: bold;color:#26536A;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #0000ff;">call</span> symput<span style="color: #66cc66;">&#40;</span><span style="color: #a020f0;">'sd'</span>||left<span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">_N_</span><span style="color: #66cc66;">&#41;</span>,sdheight<span style="color: #66cc66;">&#41;</span>;</div></li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #0000ff;">call</span> symput<span style="color: #66cc66;">&#40;</span><span style="color: #a020f0;">'name'</span>||left<span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">_N_</span><span style="color: #66cc66;">&#41;</span>,<span style="color: #0000ff;">trim</span><span style="color: #66cc66;">&#40;</span>name<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>;</div></li>
<li style="font-weight: bold;color:#26536A;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #0000ff;">if</span> last <span style="color: #0000ff;">then</span> <span style="color: #0000ff;">call</span> symput<span style="color: #66cc66;">&#40;</span><span style="color: #a020f0;">'numspecies'</span>,<span style="color: #0000ff;">_N_</span><span style="color: #66cc66;">&#41;</span>;</div></li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #000080; font-weight: bold;">run</span>; </div></li></ol></div>
</div></div><br />

<p>If you ever want to check what macro variables you have in your program, you can use <code>%PUT _USER_;</code> to print them all to the log file. Or if you want to see every macro variable available  (SAS has quite a few automatic ones like operating system and date) use <code>%PUT _ALL_;</code>. Inserting <code>%PUT _USER_;</code> here produces:</p>
<div class="syntax_hilite"><span class="langName">SAS:</span><br /><div id="sas-11">
<div class="sas"><ol><li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">GLOBAL NUMSPECIES&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #2e8b57; font-weight: bold;color:#800000;">3</span></div></li>
<li style="font-weight: bold;color:#26536A;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">GLOBAL NAME1 Birch</div></li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">GLOBAL NAME2 Elm</div></li>
<li style="font-weight: bold;color:#26536A;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">GLOBAL NAME3 Maple</div></li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">GLOBAL MEAN1&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span style="color: #2e8b57; font-weight: bold;color:#800000;">65</span></div></li>
<li style="font-weight: bold;color:#26536A;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">GLOBAL MEAN2&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #2e8b57; font-weight: bold;color:#800000;">107</span>.<span style="color: #2e8b57; font-weight: bold;color:#800000;">5</span></div></li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">GLOBAL MEAN3&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #2e8b57; font-weight: bold;color:#800000;">100</span>.<span style="color: #2e8b57; font-weight: bold;color:#800000;">5</span></div></li>
<li style="font-weight: bold;color:#26536A;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">GLOBAL SD1 <span style="color: #2e8b57; font-weight: bold;color:#800000;">49</span>.<span style="color: #2e8b57; font-weight: bold;color:#800000;">497474683</span></div></li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">GLOBAL SD2 <span style="color: #2e8b57; font-weight: bold;color:#800000;">38</span>.<span style="color: #2e8b57; font-weight: bold;color:#800000;">837267326</span></div></li>
<li style="font-weight: bold;color:#26536A;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">GLOBAL SD3 <span style="color: #2e8b57; font-weight: bold;color:#800000;">20</span>.<span style="color: #2e8b57; font-weight: bold;color:#800000;">273134933</span> </div></li></ol></div>
</div></div><br />
<p>Now we've set a lot of macro variables but we still haven't created a real macro. In SAS, macros are started with <code>%MACRO macroname;</code> and finished with <code>%MEND;</code> (short for M[acro]END). <code>%</code>'s are used to indicate commands that the SAS macro facility will read and remove before normal SAS sees the code. Anything not with a % will be printed out by the macro facility. Macros don't spit out their code for SAS until they're are called using <code>%macroname</code>.</p>

<p>So I'll call my macro <code>treestandardizer</code> but you can call it whatever you want. I'm going to use a pretty simple and specific macro but if you were going to use this often and for different datasets you would want to program it better. The first thing to do is create the <code>final</code> dataset and set it to the <code>trees</code> dataset. Since we need to loop through each species of tree, we'll need a <code>%DO</code> loop. Everything between <code>%DO</code> and <code>%END</code> will be repeated while <code>i</code> increments from 1 to the number of tree species.  If you want to combine text and a macro variable to reference another macro variable, you use the double ampersand <code>&amp;&amp;<!--formatted--></code> in SAS. For example, we want to get the mean for species 1 by looking in the macro variable <code>&amp;mean1<!--formatted--></code> so we use <code>&amp;&amp;mean&amp;i<!--formatted--></code>. I <em>think</em> the macro processing part of SAS ends up running through the code twice, the first time finding the <code>&amp;&amp;<!--formatted--></code> and replacing it with <code>&amp;<!--formatted--></code> and the <code>&amp;i<!--formatted--></code> and replacing it with <code>1</code> to leave <code>&amp;mean1<!--formatted--></code> and the second time finding <code>&amp;mean1<!--formatted--></code> and pasting in the appropriate value (65). So we'll have the do loop write out a series of <code>if</code> statements to check what the name of the tree is and use the appropriate mean and deviation. Note that when using a string macro variable like <code>&amp;nameX<!--formatted--></code>, you need to surround it with double quotes (the macro processor doesn't look inside single quotes) so SAS doesn't think it is a variable name. </p>

<div class="syntax_hilite"><span class="langName">SAS:</span><br /><div id="sas-12">
<div class="sas"><ol><li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #0000ff;">%MACRO</span> treestandardizer;</div></li>
<li style="font-weight: bold;color:#26536A;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #000080; font-weight: bold;">data</span> final;</div></li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #0000ff;">set</span> trees;</div></li>
<li style="font-weight: bold;color:#26536A;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #0000ff;">%DO</span> i = <span style="color: #2e8b57; font-weight: bold;color:#800000;">1</span> <span style="color: #0000ff;">%TO</span> <span style="color: #0000ff; font-weight: bold;">&amp;numspecies</span>;</div></li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #0000ff;">if</span> name=<span style="color: #a020f0;">"&amp;&amp;name&amp;i"</span> <span style="color: #0000ff;">then</span> stheight=<span style="color: #66cc66;">&#40;</span>height-&amp;<span style="color: #0000ff; font-weight: bold;">&amp;mean</span><span style="color: #0000ff; font-weight: bold;">&amp;i</span><span style="color: #66cc66;">&#41;</span>/&amp;<span style="color: #0000ff; font-weight: bold;">&amp;sd</span><span style="color: #0000ff; font-weight: bold;">&amp;i</span>; </div></li>
<li style="font-weight: bold;color:#26536A;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #0000ff;">%END</span>;</div></li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #000080; font-weight: bold;">run</span>;</div></li>
<li style="font-weight: bold;color:#26536A;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #0000ff;">%MEND</span>; </div></li></ol></div>
</div></div><br />

<p>The previous code prepared the macro but nothing actually happens until we call it using <code>%treestandardizer</code>. Unlike almost everything else in SAS this line doesn't have to end in a semicolon (although it's pretty unlikely to hurt if you forget and add one). So to call the macro:</p>
<div class="syntax_hilite"><span class="langName">SAS:</span><br /><div id="sas-13">
<div class="sas"><ol><li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">%treestandardizer </div></li></ol></div>
</div></div><br />

<p>If you want to see what happens when you call a macro, you can have SAS print the code generated by the macro to the log file with the option <code>option mprint;</code> (make sure to set it before actually calling the macro). In this case, it gives:</p>
<div class="syntax_hilite"><span class="langName">SAS:</span><br /><div id="sas-14">
<div class="sas"><ol><li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">MPRINT<span style="color: #66cc66;">&#40;</span>TREESTANDARDIZER<span style="color: #66cc66;">&#41;</span>:&nbsp; &nbsp;<span style="color: #000080; font-weight: bold;">data</span> final;</div></li>
<li style="font-weight: bold;color:#26536A;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">MPRINT<span style="color: #66cc66;">&#40;</span>TREESTANDARDIZER<span style="color: #66cc66;">&#41;</span>:&nbsp; &nbsp;<span style="color: #0000ff;">set</span> trees;</div></li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">MPRINT<span style="color: #66cc66;">&#40;</span>TREESTANDARDIZER<span style="color: #66cc66;">&#41;</span>:&nbsp; &nbsp;<span style="color: #0000ff;">if</span> name=<span style="color: #a020f0;">"Birch"</span> <span style="color: #0000ff;">then</span> stheight=<span style="color: #66cc66;">&#40;</span>height- <span style="color: #2e8b57; font-weight: bold;color:#800000;">65</span><span style="color: #66cc66;">&#41;</span>/<span style="color: #2e8b57; font-weight: bold;color:#800000;">49</span>.<span style="color: #2e8b57; font-weight: bold;color:#800000;">497474683</span>;</div></li>
<li style="font-weight: bold;color:#26536A;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">MPRINT<span style="color: #66cc66;">&#40;</span>TREESTANDARDIZER<span style="color: #66cc66;">&#41;</span>:&nbsp; &nbsp;<span style="color: #0000ff;">if</span> name=<span style="color: #a020f0;">"Elm"</span> <span style="color: #0000ff;">then</span> stheight=<span style="color: #66cc66;">&#40;</span>height- <span style="color: #2e8b57; font-weight: bold;color:#800000;">107</span>.<span style="color: #2e8b57; font-weight: bold;color:#800000;">5</span><span style="color: #66cc66;">&#41;</span>/<span style="color: #2e8b57; font-weight: bold;color:#800000;">38</span>.<span style="color: #2e8b57; font-weight: bold;color:#800000;">837267326</span>;</div></li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">MPRINT<span style="color: #66cc66;">&#40;</span>TREESTANDARDIZER<span style="color: #66cc66;">&#41;</span>:&nbsp; &nbsp;<span style="color: #0000ff;">if</span> name=<span style="color: #a020f0;">"Maple"</span> <span style="color: #0000ff;">then</span> stheight=<span style="color: #66cc66;">&#40;</span>height- <span style="color: #2e8b57; font-weight: bold;color:#800000;">100</span>.<span style="color: #2e8b57; font-weight: bold;color:#800000;">5</span><span style="color: #66cc66;">&#41;</span>/<span style="color: #2e8b57; font-weight: bold;color:#800000;">20</span>.<span style="color: #2e8b57; font-weight: bold;color:#800000;">273134933</span>; </div></li></ol></div>
</div></div><br />

<p>So it worked and we now have the standardized heights in the <code>stheight</code> column of the <code>final</code> dataset. This particular example could be done a few different ways (the easiest and probably better way being to merge the <code>meancv</code> dataset with the <code>trees</code>) but I hope it gives a decent introduction to SAS macros. If you have any specific questions or something wasn't clear, feel free to ask in a comment.</p> 

<p>Here is the <a href="/res/SAS_macro_example.sas">SAS source code</a> if you don't feel like copying and pasting.</p>

]]></content:encoded>
			<wfw:commentRss>http://scott.sherrillmix.com/blog/programmer/sas-macros-letting-sas-do-the-typing/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
		<item>
		<title>WP_MonsterID and Statistics</title>
		<link>http://scott.sherrillmix.com/blog/programmer/wp_monsterid-and-statistics/</link>
		<comments>http://scott.sherrillmix.com/blog/programmer/wp_monsterid-and-statistics/#comments</comments>
		<pubDate>Wed, 24 Jan 2007 21:44:26 +0000</pubDate>
		<dc:creator>ScottS-M</dc:creator>
				<category><![CDATA[Programmer]]></category>
		<category><![CDATA[Statistician]]></category>
		<category><![CDATA[Web]]></category>
		<category><![CDATA[birthday paradox]]></category>
		<category><![CDATA[monster]]></category>
		<category><![CDATA[monsterid]]></category>
		<category><![CDATA[random]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[user]]></category>
		<category><![CDATA[wordpress]]></category>

		<guid isPermaLink="false">http://scott.sherrillmix.com/blog/programmer/wp_monsterid-and-statistics/</guid>
		<description><![CDATA[After making the WP_MonsterID WordPress plugin to create a random monster avatar from an assortment of parts for each commenter (based on other people's code), fruityoaty asked This looks nifty, but how many monster images are available for assigning? I'd been meaning to calculate this anyway so I did the math and posted it in [...]]]></description>
			<content:encoded><![CDATA[<img class="left" src="/res/images/monsterid_example2.png" alt="An example of a MonsterID" /><p>After making the <a href="http://scott.sherrillmix.com/blog/blogger/wp_monsterid/">WP_MonsterID WordPress plugin</a> to create a random monster avatar from an assortment of parts for each commenter (based on <a href="http://www.docuverse.com/blog/donpark/2007/01/19/identicon-explained">other</a> <a href="http://www.splitbrain.org/projects/monsterid">people's</a> code), <a href="http://fruityoaty.com/">fruityoaty</a> asked <q>This looks nifty, but how many monster images are available for assigning?</q></p>

<p>I'd been meaning to calculate this anyway so I did the math and posted it in the <a href="http://scott.sherrillmix.com/blog/blogger/wp_monsterid/#comment-72">comments</a>:</p>
<blockquote><p>The current part totals are: 17 eyes, 8 hairs, 12 mouths, 15 bodies, 10 legs. That is 244,800 possible combinations. In addition, the body color can range between 20-235 for red, green, and blue. If we count that as 20 distinguishable values for red, green, and blue that adds 8000 possible colors and brings the unique monster count to 2 billion. The only problem is that the algorithm is only using the first 6 digits of the md5 hash of the email which only provides 16 million possible combinations. So I guess the answer is 16 million monsters currently and in the next release I’ll use a few more digits of the hash and increase it to a billion or so. <ins datetime="2007-02-10T21:00:14+00:00"><em>Edit: I did change this so in version 0.3 and later there should be a couple billion possibilities.</em></ins></p></blockquote>
<p>Calculating this got me wondering how many unique users it would take before there was likely to be a duplicate monster. For two users it was easy (1 out of 2 billion) but as the number of users increased things got messy since each new monster could match any of the prior monsters. Luckily I remembered enough of my stats class to google for something on <a href="http://en.wikipedia.org/wiki/Birthday_paradox">calculating the chance of people in a group sharing a birthday</a>.</p> 
<p>If you've never heard of this problem, stop and take a quick guess for how many people you think it would take for the odds to be better than 50% for two people sharing a birthday. Or as my statistics professor put it, <q>There are twenty-five people in the room will you bet me that no one shares the same birthday?</q></p>
<p>...</p>
<p>...</p>
<p>Guessed?</p>
<p>Now I know just enough statistics to know betting against a statistics professor is a bad idea but I have to say, at the time, I thought it would have been a fairly good bet. It turns out that I, like most non-statistics professors, underestimated the chance of any two people in a group sharing the same birthday. Actually, if there are 23 people in a room there is a greater than 50% chance that at least two will share a birthday. If there are 47 people in a room, there is a 95% chance that at least one pair share a birthday. This greatly increasing probability occurs because like the monsters each person added to the room can match any of the previous people (the 5th person can match person 1,2,3,4; the 25th person can match person 1,2,3,4,5,6,...,24;...).</p>
<p>All this was interesting in understanding the problem but didn't really get any closer to finding the probability. Luckily Wikipedia provides an approximation for determining the number of people at a given probability of overlap:</p>
<img src="/res/images/birthday_problem_n_at_p.png" alt="An approximation for calculating the chance two or more people with the same birthday in a group."/>
<p>Substituting in 2 billion for 365, results in a probablity of overlap that looks like:</p>
<img src="/res/images/monster_vs_prob_overlap.png" alt="Number of monsters for a given probability of overlap in 2 billion monsters."/>
<p>Even with 2 billion monsters there is still a 50% chance of overlap with only 52,000 monsters and a 1 out of 10 chance of overlap with only 20,000 monsters. Most unintuitive to me is that there's a 99% chance of overlap with only 135,000 monsters. The chances of an overlap really does pile up as the number of already present monsters grow. In the plus side, most normal sized blogs should be safe from monster overlap with only a .1% chance of overlap even with 2000 commenters.</p>
<p>So what does all this mean? Well besides not getting suckered in any birthday betting, it's a good reminder to be careful about assuming uniqueness among a group just because the chance of a match is rare. For example, if in some application each user was assigned a random key of 4 digits (10000 possible combinations). There would be a greater than 50% chance of overlap after only 1% (117 users) of the keys were assigned.</p>
<p>If any one feels like messing around with the calculations themselves here's the function in R to calculate the miminimum number of assignments to reach a certain probability of overlap from a total number of possible combinations. I'm sure it would be trivial to convert to any other language. Note that's natural log not log10. <code>number_assignments=function(total_number,probability_overlap){sqrt(2*total_number*log(1/(1-probability_overlap)))}</code></p>]]></content:encoded>
			<wfw:commentRss>http://scott.sherrillmix.com/blog/programmer/wp_monsterid-and-statistics/feed/</wfw:commentRss>
		<slash:comments>33</slash:comments>
		</item>
	</channel>
</rss>

