Great Turtle Race

I’m a bit late on this one (I don’t know how I managed to miss it since I had to put the data together) but National Geographic and Conservation International had a Great Turtle Race with a bunch of leatherback turtles tagged by my old adviser. They took data from turtle tracked from Nova Scotia to South America and had a big two-week event watching which turtle reached the Caribbean first. They have a pretty cool animation of the satellite tracking (although of course not quite as good as mine) and some cute leatherback artwork (complete with leathery back instead of shell, although why are they green?).

Flash animation of Great Turtle Race Leatherback turtle game

That site also has the first leatherback game I’ve ever seen. The artist did a really good job since the view is pretty much identical to the view we get from a shoulder mounted turtleCam. Unfortunately, the game turtle handles like a tank which really doesn’t do justice to the maneuvering ability of leatherbacks. They’re huge animals but in the water they’re really quite graceful and they can turn on a dime (as I quickly found out when we were trying to catch them).

Backspacer leatherback by Chris Rooney

Anyway, it looks like the turtle named Backspacer (it’s weird to see all the interesting names since we always call the turtles by their tag ID number), sponsored by Pearl Jam, yes that Pearl Jam, won the race. Turtle Cali won the diving portion of the race and received an Iron Turtle Award. Here’s a nice post-race summary and also Olympic swimmer (and turtle coach) Jason Lezak’s take on it. It’s great to see so much public interest in leatherback turtle tracking and National Geographic and Conservation International did a great job promoting and running the event.

Biologist
Leatherback

Comments (0)

Permalink

Displaying Code in LaTeX

gioby of Bioinfo Blog! (an interesting read by the way) left a comment asking about displaying code in LaTeX documents. I’ve sort of been cludging around using \hspace’s and \textcolor but I’ve always meant to figure out the right way to do things so this seemed like a good chance to figure out how to do it right.

LaTeX tends to ignore white space. This is good when you’re writing papers but not so good when you’re trying to show code where white space is an essential part (e.g. Python). Luckily there’s a builtin verbatim environment in LaTeX that is equivalent to html’s <pre>. So something like the following should preserve white space.

Code in LaTeX using verbatim
\begin{verbatim}
for i in range(1, 5):
  print i
else:
  print "The for loop is over"
\end{verbatim}

Unfortunately, you can’t use any normal LaTeX commands inside verbatim (since they’re displayed verbatim). But luckily there a handy package called fancyvrb that fixes this (the color package is also useful for adding colors). For example, if you wanted to highlight “for” in the above code, you can use the Verbatim (note the capital V) environment from fancyvrb:

Code in LaTeX using fancyvrb
\newcommand\codeHighlight[1]{\textcolor[rgb]{1,0,0}{\textbf{#1}}}
\begin{Verbatim}[commandchars=\\\{\}]
\codeHighlight{for} i in range(1, 5):
  print i
else:
  print "The for loop is over"
\end{Verbatim}
Code in LaTeX using pygmentize

If you really want to get fancy, the Pygments package in Python will output syntax highlighted latex code with a command like: pygmentize -f latex -O full test.py >py.tex The LaTeX it outputs is a bit hard to read but it’s not too bad (it helped me figure out the fancyvrb package) and it does make nice syntax highlighted output.

Here’s an example LaTeX file with the three examples above and the pdf it generates if you’re curious.

LaTeX
Programmer

Comments (4)

Permalink

Liberty Bell Judo Tournament

We had a big judo tournament in Philadelphia last weekend. I went and had a lot of fun. My coach put up a few videos (conveniently leaving out the ones where I lost) so I thought I’d link to them here.

They’re both pretty short because I got in lucky throws that were big enough to end them early.

Uncategorized

Comments (0)

Permalink

Counting Q20 Bases in a .qual File

I sometimes get asked to count the number of bases with qualities greater than or equal to 20 in a quality file. I’m not entirely sure this is all that good a metric with 454 sequencing but that’s another story. It always takes me a minute or two to come up with the right Unix commands to do it so I’m going to post it here so I remember (and maybe save someone else a couple minutes).

cat *qual|grep '^[^>]'|sed 's/ /\n/g'|grep -c [234][0-9]

This is very quick and dirty (just removing lines starting with “>”, replacing spaces with newlines and counting the resulting lines with quals 20-40) but it seems to work ok for me. Also yes I know it’s stupid to cat to a grep but I often replace the cat with head for testing. And I’m sure you could do it in a single awk or sed step but it gets done in a minute or two for several hundred million bases so I haven’t really been motivated to change it.

Bash/UNIX
Bioinformatics
Biologist
Programmer

Comments (1)

Permalink

Converting Between Gapped and NonGapped DNA Coordinates in R

I've been needing to convert coordinates in gapped DNA alignments back and forth to coordinates in the nongapped sequence a lot recently. For example, the T in AC--TGA--A is in the 5th position but in the gapless sequence ACTGAA it is 3rd. My first few tries (counting chars in regex'd substrings) at programming up a decent converters ended up not working very well with large datasets. Since I had to do it with 50,000 positions in a sequence with 2 million letters, I was running into a little trouble. But it ended up actually being a pretty easy problem in R so I thought I'd post it up in case anybody else is running into something similar. Also it's a pretty good example why programming in R can be fun.

R:
gap2NoGap<-function(gapSeq,coords){
   gapSeqSplit<-strsplit(gapSeq,'')[[1]]
   nonDash<-gapSeqSplit!='-'
   newCoords<-cumsum(nonDash)
   return(newCoords[coords])
}

So the function takes a gapped sequence and the gapped coordinates. It splits the gapped sequence into an array of single characters and finds which ones are not -'s. It then takes the cumulative sum at each element in the array (TRUE evaluates as 1 and FALSE as 0). This gives an array where each element gives the new gap-free coordinates. So the function just returns the appropriate new coordinate for each old coordinate.

R:
noGap2Gap<-function(gapSeq,coords){
   gapSeqSplit<-strsplit(gapSeq,'')[[1]]
   newCoords<-which(gapSeqSplit!='-')
   return(newCoords[coords])
}

This opposite conversion function again takes the gapped sequence and appropriately the nongapped coordinates. It splits the gapped sequence into an array of single characters and finds the indices of the characters which are not -'s. Since we only stored the indices of nongap characters, this array gives the gapped coordinates for each nongap letter. So again the function can just returns the appropriate new coordinate for each old coordinate.

These ended up being pretty elegant in R (once I finally figured them out).

Bioinformatics
Biologist
Programmer
R

Comments (3)

Permalink