SAS lag problems

I just found an interesting (that’s interesting in the ‘I just spent an hour debugging that?’ sense) characteristic in SAS. If you have a variable, x, and are using the lag of x, do NOT put the lag(x) inside a conditional statement. This can apparently cause some pretty strange results. It is probably easier to see through code than to explain:

[sas] data test; input id$1 x; cards; a 1 a 2 a 3 b 12 b 13 b 15 run; data test; set test; lagid=lag(id); if lagid=id then do; lagx=lag(x); duration=x-lagx; end; run; proc print; run; [/sas]

And here are the results:

[sas] Obs id x lagid lagx duration 1 a 1 . . 2 a 2 a . . 3 a 3 a 2 1 4 b 12 a . . 5 b 13 b 3 10 6 b 15 b 13 2 [/sas]

There are obviously some strange things going on here. I have no idea what is going on in the 2nd row of the results where lagx is missing when it should be 1. And in the 5th row, lagx is the 3 from 2 rows above when it should be 12. Luckily, the solution is easy. Just execute the lag every time:

[sas] data test; set test; lagid=lag(id); lagx=lag(x); if lagid=id then do; duration=x-lagx; end; run; [/sas]

Anyway, it took me two years to run into this so I guess this isn’t a common mistake but if you’re ever having trouble with a lag, check to make sure it is executed for every line of the data.