Converting Between Gapped and NonGapped DNA Coordinates in R

I’ve been needing to convert coordinates in gapped DNA alignments back and forth to coordinates in the nongapped sequence a lot recently. For example, the T in AC–TGA–A is in the 5th position but in the gapless sequence ACTGAA it is 3rd. My first few tries (counting chars in regex’d substrings) at programming up a decent converters ended up not working very well with large datasets. Since I had to do it with 50,000 positions in a sequence with 2 million letters, I was running into a little trouble. But it ended up actually being a pretty easy problem in R so I thought I’d post it up in case anybody else is running into something similar. Also it’s a pretty good example why programming in R can be fun.

[R] gap2NoGap<-function(gapSeq,coords){ gapSeqSplit<-strsplit(gapSeq,'')[[1]] nonDash<-gapSeqSplit!='-' newCoords<-cumsum(nonDash) return(newCoords[coords]) } [/R]

So the function takes a gapped sequence and the gapped coordinates. It splits the gapped sequence into an array of single characters and finds which ones are not -‘s. It then takes the cumulative sum at each element in the array (TRUE evaluates as 1 and FALSE as 0). This gives an array where each element gives the new gap-free coordinates. So the function just returns the appropriate new coordinate for each old coordinate.

[R] noGap2Gap<-function(gapSeq,coords){ gapSeqSplit<-strsplit(gapSeq,'')[[1]] newCoords<-which(gapSeqSplit!='-') return(newCoords[coords]) } [/R]

This opposite conversion function again takes the gapped sequence and appropriately the nongapped coordinates. It splits the gapped sequence into an array of single characters and finds the indices of the characters which are not -‘s. Since we only stored the indices of nongap characters, this array gives the gapped coordinates for each nongap letter. So again the function can just returns the appropriate new coordinate for each old coordinate.

These ended up being pretty elegant in R (once I finally figured them out).

Bioinformatics
Biologist
Programmer
R

Comments (3)

Permalink

Converting .eps to .png Easily

I end up generating a lot of postscript plots in R and other programs. Unfortunately, a lot of not so technical people have trouble opening postscript files so I end up having to convert these images to other formats pretty often. A really handy program for converting eps files to png (or jpg although that’s not really an optimal format for plots) is ImageMagick (available for all OSs I believe). ImageMagick lets you quickly convert (and create thumbnails, make B&W,…) from the command line without having to open up Photoshop.

For example, to convert an image named myPlot.eps to png you just need to enter convert myPlot.eps myPlot.png (convert is a program in the ImageMagick package) at the command prompt and you’ll get a png file in myPlot.png. If you want to adjust the resolution (the default resolution is 72 dpi) of the output image, you can add the -density option (e.g. for 200 dpi convert -density 200 myPlot.eps myPlot.png). Make sure you put the -density part before the input image name.

Converting many files at once is where ImageMagick really shines. The mogrify command is probably the quickest option. For example, to convert the files image01.eps, image02.eps and image03.eps to png, just use the command mogrify -format png image*.eps. In one shot, it will create image01.png, image02.png and image03.png.

Unfortunately, recent version of Imagemagick seem to be treating eps to png conversions oddly (see below) so mogrify isn’t cutting it on my files. If you have similar trouble (and you’re on Unix or Mac or Cygwin), you can just use a bit of Bash combined with the convert command to get around the problem like this:

[bash] for f in `ls *.eps`; do convert -density 100 $f -flatten ${f%.*}.png; done [/bash]

Black Background Problem

As I mentioned above, I started having trouble converting from eps to png after upgrading to ImageMagick 6.4. The transparent/white backgrounds in my eps files were being converted to black backgrounds and making the figures unreadable. I guess it must be some change in how transparency is handled but I’m not totally sure what changed. It took me a bit of googling before I found the solution, so I’ll repost it here. Adding -flatten to the command (e.g. convert myImage.eps -flatten myImage.png) should change the background back to white. My mogrify command doesn’t include the -flatten option so convert (like the example above) seems like the way to go.

Bash/UNIX
Programmer

Comments (17)

Permalink