Tab Indented Standard Input Redirect in Bash

I managed to forget how to redirect standard input (when you want to feed a bunch of lines to a program) in a bash script while still indenting and had to go digging around for it. So I figured I’d make a note here so I don’t forget again and for anyone else in the same boat. It’s just <<- instead of <<. For example if you want to keep indentation within a loop:

for i in 1 2 3 4;do cat<<-EOF This is loop $i More advanced stuff could go here EOF done [/bash]

You can use whatever you want to indicate the end of the input instead of EOF if it floats your boat (as long as you use the same thing both times) but unfortunately <<- doesn’t work with spaces for indentation (although I’m a tab man myself).


Comments (2)


Displaying Code in LaTeX

gioby of Bioinfo Blog! (an interesting read by the way) left a comment asking about displaying code in LaTeX documents. I’ve sort of been cludging around using \hspace‘s and \textcolor but I’ve always meant to figure out the right way to do things so this seemed like a good chance to figure out how to do it right.

LaTeX tends to ignore white space. This is good when you’re writing papers but not so good when you’re trying to show code where white space is an essential part (e.g. Python). Luckily there’s a builtin verbatim environment in LaTeX that is equivalent to html’s <pre>. So something like the following should preserve white space.

Code in LaTeX using verbatim
for i in range(1, 5):
  print i
  print "The for loop is over"

Unfortunately, you can’t use any normal LaTeX commands inside verbatim (since they’re displayed verbatim). But luckily there a handy package called fancyvrb that fixes this (the color package is also useful for adding colors). For example, if you wanted to highlight “for” in the above code, you can use the Verbatim (note the capital V) environment from fancyvrb:

Code in LaTeX using fancyvrb
\codeHighlight{for} i in range(1, 5):
  print i
  print "The for loop is over"
Code in LaTeX using pygmentize

If you really want to get fancy, the Pygments package in Python will output syntax highlighted latex code with a command like: pygmentize -f latex -O full >py.tex The LaTeX it outputs is a bit hard to read but it’s not too bad (it helped me figure out the fancyvrb package) and it does make nice syntax highlighted output.

Here’s an example LaTeX file with the three examples above and the pdf it generates if you’re curious.


Comments (8)


Counting Q20 Bases in a .qual File

I sometimes get asked to count the number of bases with qualities greater than or equal to 20 in a quality file. I’m not entirely sure this is all that good a metric with 454 sequencing but that’s another story. It always takes me a minute or two to come up with the right Unix commands to do it so I’m going to post it here so I remember (and maybe save someone else a couple minutes).

cat *qual|grep '^[^>]'|sed 's/ /\n/g'|grep -c [234][0-9]

This is very quick and dirty (just removing lines starting with “>”, replacing spaces with newlines and counting the resulting lines with quals 20-40) but it seems to work ok for me. Also yes I know it’s stupid to cat to a grep but I often replace the cat with head for testing. And I’m sure you could do it in a single awk or sed step but it gets done in a minute or two for several hundred million bases so I haven’t really been motivated to change it.


Comments (1)


How SINs (and Credit Card Numbers) Are Validated

A while back it was tax season in Canada and a friend of mine was trying to do his taxes online. But since he was foreign and didn’t have a Social Insurance Number (their equivalent of an SSN), the helpful webapp wouldn’t let him print the thing (of course it only informed him of that after he had already entered everything). We tried a few guesses before finally just using mine and crossing out my numbers after printing it. But I always wondered how the form knew our guesses were invalid. Luckily, I recently stumbled across a mention of how credit card numbers are validated.

It turns out SINs and credit card number are checked with something called the Luhn algorithm. Basically the algorithm just involves taking each digit, multiplying the second, fourth, sixth and so on digit from the right by 2 and adding up all the resulting digits (e.g. if 7 is multiplied by 2 then the resulting 14 is split into 1+4). If the sum of the digits is a multiple of 10, the number passes.

For example, to check 345678, you’d split it into 3,4,5,6,7,8. Then multiply 3, 5 and 7 by 2 to give 6, 4, 10, 6, 14, 8. Then split all the digits again to give 6, 4, 1, 0, 6, 1, 4, 8. That adds up to 30 so 345678 would be a valid credit card number (if there wasn’t a set length).

Just for fun, here’s a quick function in R to run the Luhn Algorithm on a number (or tell you the remainder so you can adjust):

[R] luhnCheck <- function(number,returnLogical=TRUE){ numbers <- gsub('[^0-9]','',as.character(number)) numbers <- as.numeric(strsplit(numbers,'')[[1]]) selector<-seq(length(numbers)-1,1,-2) numbers[selector]<-numbers[selector]*2 numbers[numbers > 9] <- numbers[numbers > 9] – 9 remainder <- sum(numbers) %% 10 if(returnLogical) return(remainder==0) else return(remainder) } [/R]

So the next time some stupid web form needs a SIN number I’m going with 999999998.


Comments (0)


Cancer Fighting Bacteria

I was doing a bit of background reading and came across an interesting paper about mutating normal bacteria into cancer-fighting bacteria. The paper centers around a single gene called inv (short for invasin) that can give an otherwise mild-mannered noninfectious bacteria the ability to invade cells.

Now this might seem like a pretty bad idea since there are probably enough infectious bacteria in the world already but this was only the first step of the research. Anderson and colleagues attached inv to a genetic switch (normally used for bacterial metabolism control) that turns on when arabinose (a type of sugar) is present. Unfortunately this switch was a little leaky. So even bacteria without arabinose were still infectious. Not ones to let that stop them, the researchers took out the ribosome (protein-making organelle) binding region of the gene, randomly mutated it and tested to find bacteria that were off by default but still able to turn on.

Once they got that working, they decided to attach a sensor to the infective gene. Bacteria often do things like switch metabolisms when they run out of oxygen. The researchers picked one of the bacteria genes that turns on when oxygen is low and replaced the arabinose switch from the previous bit with the oxygen sensing switch from this gene. Again the switch was leaky and they had to mutate it so it stayed off by default. Once that was done they had a bacteria that was only invasive in anaerobic environments. That’s pretty cool because tumors are often anaerobic (since they’re big lumps of fast growing dense tissue).

Plasmid for density dependent infectious bacteria

To go even further, the researchers tried to create bacteria that only turn on when there are many bacteria in one location. This will be useful because tumors often have higher concentrations of bacteria due to leaking nutrients and poor immune response. By creating a switch that only turns on when a bunch of bacteria are present, the bacteria can be further targeted to cancerous cells. To do this they used a gene from an ocean-dwelling bacteria that only turns on when many bacteria are present (the ocean bacteria uses the gene to detect when it has reached the light organ of squid). It seems odd that bacteria can communicate but it comes down to a simple mechanism made up of two genes. One gene encodes an enzyme that makes a chemical, called AI-1, that easily disperses in and out of the cell membrane. The second encodes a gene activator that is turned on by high concentrations of AI-1. When there are many bacteria, the environment becomes rich in AI-1 and the gene activator turns on even more production of AI-1 and gene activators. This positive feedback causes creates a sensitive switch that switches quickly from all off to all on when bacterial concentration crosses a certain level. By linking these genes to the infectious inv gene, the researchers created a bacteria that was only infectious when in high concentrations.

So now we have bacteria that might be able to selectively infect tumor cells. By combining this selective invasiveness with cell killing or immune response activating mechanisms, bacteria could become helpful tools for treating cancer (although there is still a pretty long way to go). The paper makes it look easy but that must have taken a good bit of work to get it all working so nicely. They ended up using DNA from three different bacteria species and many different bacterial systems. It’s always really cool to see how scientists can take DNA “parts” and combine them together to create new and useful functions and even edit the DNA directly when the parts don’t fit correctly.

I guess the next step in the research is to figure out how to get a bacteria to sense both an anaerobic and a high density environment. This might be a bit tricky since the two sensors would have to interact but I see some of the same researchers also have a paper on creating bacterial AND gates so I’ll have to give that one a read too.


Anderson, J., Clarke, E., Arkin, A., Voigt, C. (2006). Environmentally Controlled Invasion of Cancer Cells by Engineered Bacteria. Journal of Molecular Biology, 355(4), 619-627. DOI: 10.1016/j.jmb.2005.10.076


Comments (3)