{"id":422,"date":"2009-03-18T12:43:17","date_gmt":"2009-03-18T16:43:17","guid":{"rendered":"http:\/\/scott.sherrillmix.com\/blog\/?p=422"},"modified":"2009-03-18T12:43:17","modified_gmt":"2009-03-18T16:43:17","slug":"counting-q20-bases-in-a-qual-file","status":"publish","type":"post","link":"http:\/\/scott.sherrillmix.com\/blog\/biologist\/counting-q20-bases-in-a-qual-file\/","title":{"rendered":"Counting Q20 Bases in a .qual File"},"content":{"rendered":"

I sometimes get asked to count the number of bases with qualities greater than or equal to 20 in a quality file. I’m not entirely sure this is all that good a metric with 454 sequencing but that’s another story. It always takes me a minute or two to come up with the right Unix commands to do it so I’m going to post it here so I remember (and maybe save someone else a couple minutes).<\/p>\r\n

\r\ncat *qual|grep '^[^>]'|sed 's\/ \/\\n\/g'|grep -c [234][0-9]\r\n<\/code><\/p>\r\n

This is very quick and dirty (just removing lines starting with “>”, replacing spaces with newlines and counting the resulting lines with quals 20-40) but it seems to work ok for me. Also yes I know it’s stupid to cat to a grep but I often replace the cat with head for testing. And I’m sure you could do it in a single awk or sed step but it gets done in a minute or two for several hundred million bases so I haven’t really been motivated to change it.<\/p>\r\n","protected":false},"excerpt":{"rendered":"I sometimes get asked to count the number of bases with qualities greater than or equal to 20 in a quality file. I’m not entirely sure this is all that good a metric with 454 sequencing but that’s another story. It always takes me a minute or two to come up with the right Unix […]","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6,423,4,5],"tags":[706,463,465,464,462,461,466,376],"_links":{"self":[{"href":"http:\/\/scott.sherrillmix.com\/blog\/wp-json\/wp\/v2\/posts\/422"}],"collection":[{"href":"http:\/\/scott.sherrillmix.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/scott.sherrillmix.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/scott.sherrillmix.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"http:\/\/scott.sherrillmix.com\/blog\/wp-json\/wp\/v2\/comments?post=422"}],"version-history":[{"count":3,"href":"http:\/\/scott.sherrillmix.com\/blog\/wp-json\/wp\/v2\/posts\/422\/revisions"}],"predecessor-version":[{"id":425,"href":"http:\/\/scott.sherrillmix.com\/blog\/wp-json\/wp\/v2\/posts\/422\/revisions\/425"}],"wp:attachment":[{"href":"http:\/\/scott.sherrillmix.com\/blog\/wp-json\/wp\/v2\/media?parent=422"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/scott.sherrillmix.com\/blog\/wp-json\/wp\/v2\/categories?post=422"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/scott.sherrillmix.com\/blog\/wp-json\/wp\/v2\/tags?post=422"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}