Counting Q20 Bases in a .qual File
I sometimes get asked to count the number of bases with qualities greater than or equal to 20 in a quality file. I’m not entirely sure this is all that good a metric with 454 sequencing but that’s another story. It always takes me a minute or two to come up with the right Unix commands to do it so I’m going to post it here so I remember (and maybe save someone else a couple minutes).
cat *qual|grep '^[^>]'|sed 's/ /\n/g'|grep -c [234][0-9]
This is very quick and dirty (just removing lines starting with “>”, replacing spaces with newlines and counting the resulting lines with quals 20-40) but it seems to work ok for me. Also yes I know it’s stupid to cat to a grep but I often replace the cat with head for testing. And I’m sure you could do it in a single awk or sed step but it gets done in a minute or two for several hundred million bases so I haven’t really been motivated to change it.