<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Dammit Jim! &#187; count</title>
	<atom:link href="http://scott.sherrillmix.com/blog/tag/count/feed/" rel="self" type="application/rss+xml" />
	<link>http://scott.sherrillmix.com/blog</link>
	<description>I'm a biologist not a...</description>
	<lastBuildDate>Mon, 06 Feb 2012 05:19:08 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3</generator>
		<item>
		<title>Counting Q20 Bases in a .qual File</title>
		<link>http://scott.sherrillmix.com/blog/biologist/counting-q20-bases-in-a-qual-file/</link>
		<comments>http://scott.sherrillmix.com/blog/biologist/counting-q20-bases-in-a-qual-file/#comments</comments>
		<pubDate>Wed, 18 Mar 2009 16:43:17 +0000</pubDate>
		<dc:creator>ScottS-M</dc:creator>
				<category><![CDATA[Bash/UNIX]]></category>
		<category><![CDATA[Bioinformatics]]></category>
		<category><![CDATA[Biologist]]></category>
		<category><![CDATA[Programmer]]></category>
		<category><![CDATA[count]]></category>
		<category><![CDATA[grep]]></category>
		<category><![CDATA[Q20]]></category>
		<category><![CDATA[qual]]></category>
		<category><![CDATA[qualities]]></category>
		<category><![CDATA[sed]]></category>
		<category><![CDATA[sequencing]]></category>

		<guid isPermaLink="false">http://scott.sherrillmix.com/blog/?p=422</guid>
		<description><![CDATA[I sometimes get asked to count the number of bases with qualities greater than or equal to 20 in a quality file. I&#8217;m not entirely sure this is all that good a metric with 454 sequencing but that&#8217;s another story. It always takes me a minute or two to come up with the right Unix [...]]]></description>
			<content:encoded><![CDATA[<p>I sometimes get asked to count the number of bases with qualities greater than or equal to 20 in a quality file. I&#8217;m not entirely sure this is all that good a metric with 454 sequencing but that&#8217;s another story. It always takes me a minute or two to come up with the right Unix commands to do it so I&#8217;m going to post it here so I remember (and maybe save someone else a couple minutes).</p>
<p><code>
cat *qual|grep &#039;^[^&gt;]&#039;|sed &#039;s/ /\n/g&#039;|grep -c [234][0-9]
</code></p>
<p>This is very quick and dirty (just removing lines starting with &#8220;&gt;&#8221;, replacing spaces with newlines and counting the resulting lines with quals 20-40) but it seems to work ok for me. Also yes I know it&#8217;s stupid to cat to a grep but I often replace the cat with head for testing. And I&#8217;m sure you could do it in a single awk or sed step but it gets done in a minute or two for several hundred million bases so I haven&#8217;t really been motivated to change it.</p>
]]></content:encoded>
			<wfw:commentRss>http://scott.sherrillmix.com/blog/biologist/counting-q20-bases-in-a-qual-file/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

