I just found an interesting (that’s interesting in the ‘I just spent an hour debugging that?’ sense) characteristic in SAS. If you have a variable, x
, and are using the lag
of x
, do NOT put the lag(x)
inside a conditional statement. This can apparently cause some pretty strange results. It is probably easier to see through code than to explain:
And here are the results:
[sas] Obs id x lagid lagx duration 1 a 1 . . 2 a 2 a . . 3 a 3 a 2 1 4 b 12 a . . 5 b 13 b 3 10 6 b 15 b 13 2 [/sas]There are obviously some strange things going on here. I have no idea what is going on in the 2nd row of the results where lagx
is missing when it should be 1. And in the 5th row, lagx
is the 3 from 2 rows above when it should be 12. Luckily, the solution is easy. Just execute the lag every time:
Anyway, it took me two years to run into this so I guess this isn’t a common mistake but if you’re ever having trouble with a lag
, check to make sure it is executed for every line of the data.
fragtal | 19-Sep-07 at 10:03 pm | Permalink
Thank you for the post! I’ve run into the same problem, and it does seem that SAS lag function doesn’t work correctly in conditional statements.
The Norwegian guy | 25-Oct-07 at 11:58 am | Permalink
I’m a new SAS-programmer, and it took me a mere two months to run into this. It fits nicely in a mile-long list of problems and disappointments about the SAS language, but that’s another story.
The Norwegian guy | 25-Oct-07 at 12:00 pm | Permalink
Oh. And the fix won’t work if your lagx-variable already exists, and you only want to fill in the holes where it is missing and not replace the others.
ScottS-M | 25-Oct-07 at 9:18 pm | Permalink
@Norwegian guy
I really hated SAS when I first started with it. I think it took me several months to finally get to the point where I didn’t mind using it. If I remember right getting used to the macro functionality really helped. I still use R for quick or small things but SAS really is quick when you’re working with huge datasets. It seems like Matlab might combine many of the benefits of R with the speed of SAS but perhaps without as robust a statistics package.
For your already existing lagx problem. I wonder if you could do something like:
altlagx=lag(x);
if missing(lagx) then lagx=altlagx;
drop altlagx;
That way the lag gets taken outside the conditional and the holes in lagx are filled in with the results.
The Norwegian guy | 26-Oct-07 at 3:24 am | Permalink
Thanks, ScottS-M
I worked around it after posting yesterday, just like you suggested. I’m still stuck with a bad feeling though, as these types of “workarounds” seem to be needed for just about everything I do in SAS. In this case, I wanted to manipulate 20 of my variables, had to create 20 more temporary variables to do the manipulation, and now have to create even twenty more to bypass this “feature” in the lag-function. Needless to say, the code becomes extremely bloated and hard to read and debug from all of this.
Using macro variables still has me puzzled, as it seems to be no picnic using them actively within a data step. I guess I’ll post about that in another thread :)
I’m still hoping that all of my troubles are I’m no seasoned SAS programmer yet, hopefully I will learn how to code more efficiently as time goes.
Thanks again for helping, and sorry if I let out too much steam.
The Norwegian guy | 26-Oct-07 at 3:31 am | Permalink
Oh! Do you know of any SAS forums where users can help each other, discuss problems etc? I’ve been searching around, but I’m having a hard time actually finding any :/
ScottS-M | 26-Oct-07 at 8:01 am | Permalink
Glad you got it working. No problem letting off steam (and that was pretty mild steam). I remember my first half year or so was filled with expletives.
I tend to bang my head against the wall until I solve things (or switch to another language for the problem) so I don’t really frequent any forums. There are the official SAS forums although they don’t seem to have a category for basic questions and there’s the Usenet group comp.soft-sys.sas. Also in the real world, our university had a nice 2-day SAS course on macros which would have been great for a beginner and I also occasionally get emails from a SAS user group in the area.
Also for macro variables there’s my (rather poor) introduction and now that you mention it I’ll try to get a post up about using macros in the next few days (Edit: Now available here).
Kelly LeVoyer | 05-Nov-07 at 1:46 pm | Permalink
SAS employee here…There’s also http://www.sascommunity.org though admittedly it’s still getting off the ground. If you haven’t already you might want to check out Chris Hemedinger’s blog http://blogs.sas.com/sasdummy/ (not to imply you’re a dummy ;) but it’s a good basic resource as well, feel free to comment there with questions. Chris co-authored the SAS For Dummies book.
ScottS-M | 06-Nov-07 at 2:47 am | Permalink
@Kelly LeVoyer
Thanks a lot for stopping by. It’s really cool to see SAS staff involved in user discussions. I think I’ll throw all this ‘where to find help’ into a quick post in the next couple days to make it easier for people to find.
Bobbi | 05-Jul-09 at 6:58 pm | Permalink
Hey-
I’ve tried this code and it isn’t working. Well, I need to lag a dataset 12 months back, by id number but it keeps lagging it 24 months back, How do I correct this error?
B
Jessica | 10-Jul-09 at 6:04 pm | Permalink
Quote: “I have no idea what is going on in the 2nd row of the results where lagx is missing when it should be 1. And in the 5th row, lagx is the 3 from 2 rows above when it should be 12. ”
I think I know why. The 2nd row, lagx is missing is because this is the first time that the conditional statement is true and executed, hence there is no previous “x” value. The 5th row is because the previous value of x(inside the conditional statement) is 3– the last time the conditional statement was executed before this one, hence lagx = 3.
ScottS-M | 11-Jul-09 at 12:18 am | Permalink
Hmm that does sound about right. Good thinking.
Ozzy | 20-Aug-09 at 10:24 am | Permalink
Just saw your post – I had the same problem, but just for the second record where it seemed to go two back. I tried every trick I could think of, no go. I simply moved the lag outside the else() statement and into the main body and *viola* worked like a charm.
BTW, a nice write-up and examples of the lag from heck :0)
http://www.howles.com/saspapers/CC33.pdf
Nick | 01-Jul-10 at 2:53 pm | Permalink
From what I understand, this is not a bug. Calling lag(x) does two things: (1) it saves the current value of x for the next call of lag(x); and (2) it returns the previously-saved value of x (saved by a prior call to lag(x)).
Here’s someone’s PDF explaining what they call a common misuse of lag():
http://www.nesug.org/proceedings/nesug06/cc/cc32.pdf
It makes sense to me in that “altered way of thinking” that SAS sometimes provokes lol.
VVS | 18-May-11 at 9:44 pm | Permalink
I really spent ~2 hours to debug the same type of (buggy) behaviour. The source of such behaviour looks like it creates really 2 separate independent local iterator (and also variable) in every “block” which are inside if … and else … statements. The only thing to understood that is to notice the way SAS define their syntax: you should use “do; … end;” to define solid block of commands inside SAS code, which is very important. Of course, there is no any word about what they mean saying “it is important” and also about if you don’t declare “do; … end;” it would be declared anyway in “background” :). But really phrase “it is important” means that any variable inside such blocks is local for this block… :(
I think you could use lag() inside if .. else.. if you find the way to declare your variable be global…
Other idea is that java (in which SAS macrolanguage was written?) is too stupid language which make everything be object even in case of simple macro language translator for if..else.. parsing. So we got separate uncooperative object pieces in SAS from solid code in our mind, guessing then where was the mistake…
JonathanP | 17-Jun-11 at 3:35 pm | Permalink
Try something like this:
/************************/
/*YOU NEED BY PROCESSING*/
/************************/
data test;
input id $1 x;
cards;
a 1
a 2
a 3
b 12
b 13
b 15
run;
data test;
set test;
by id;
array AC[*] c;
c = lag1(x);
if first.id then count = 1;
do i = count to dim(ac);
ac(i) = .;
end;
count + 1;
run;
proc sql noprint;
alter table test drop count, i;
quit;
Brittany | 02-Sep-11 at 2:46 pm | Permalink
Thank you for posting this! Donny & I were struggling over this issue. You saved us hours!
Johanne Lalonde | 29-Mar-12 at 4:23 pm | Permalink
Thank you so much! I was struggling with that one.
Eo | 21-Jun-12 at 6:33 am | Permalink
Thank you very much.
Chris Maxwell | 20-Sep-12 at 4:20 pm | Permalink
Thanks a ton! This was driving me nuts!!
Jane | 14-May-13 at 5:05 pm | Permalink
And one more cheer from the crowd of the thankful! The internet is awesome. And so are you!