Statistician

Functional Metagenomics: Sequence Everything and Let DNA Sort The Functions Out

ResearchBlogging.org

One of the cool things you can do with the high throughput DNA analysis of pyrosequencing, is to collect a sample from the environment, isolate the DNA from everything in it and sequence it. Then you can match the DNA up with known sequences and see what sort of microbes you had. Dinsdale and a bunch of coauthors collected the data from a bunch of such studies. They managed to find 45 bacterial samples and 42 viral samples from 9 broad environmental classifications. You can see all the different samples the authors pooled together (circles microbial and squares viral).

Locations of metagenomic samples from Dinsdale et al.

The interesting thing about this study was that instead of looking at the taxonomy of the critters as usual, they looked at the function of the genes. By simply looking at what the genes do, the researchers hoped to get a feel for what activities were going on in that environment without necessarily having to identify the species of the bacteria and viruses. To do this, they fed their 14.5 million sequences (pyrosequencing sure can generate data) into the SEED database, a big collection of genes which have been assigned to functions (for example membrane transport or sulphur metabolism) by experts. They were able to match 1 million of the bacterial and 500,000 of the viral sequences to previously identified gene functions.

It might seem odd that they would look at viral DNA since viruses are rather simple and have only a few basic genes. But the researchers were actually looking at bacterial genetic sequences being carried inside viruses. This of course brings up the question of what bacterial DNA is doing inside viruses. It turns out there are a lot of bacteriophage viruses that like to infect bacteria and sometimes these viruses capture some of the DNA of their bacterial hosts and carry it to their next host. Looking at the bacterial DNA present in a viral population gives an interesting look at what types of genes are being passed around between individual bacteria (and even between bacterial species).

So here are the high level classifications of the function of the genes they found for each environment.

Percentages of gene function of bacterial and viral gene function from Dinsdale et al.

It’s pretty cool that the viruses were carrying around so much of a variety of bacterial DNA. The authors suggest that motility genes coding for things like flagella and cilia (which could help the bacterial host spread the virus further) were enriched in the viral samples but it seems a bit hard to say that for certain without a bit more analysis.

A useful way to look at huge masses of data, like their 1.5 million matches, is to try and reduce all the different counts in the functional categories into a couple of condensed variables. This can be seen in the next couple plots. They could use a little explaining. Bacterial sequences are on top and viral sequences on the bottom. Lines show how the various functional categories have been condensed into the x and y variables. For example, samples that contained lots of genes for making cell walls will tend to be at the top of the plot in the bacterial samples and tend not to have many genes for respiration.

Canonical discriminant function analysis of bacterial and viral gene function from Dinsdale et al.

It’s pretty cool to see how the various environments clustered with other samples from the same environment. For example, all the yellow diamond fish farm samples ended up on the right side of the bacteria graphs even though they were sampled independently. It appears that functions seem to correlate with environmental conditions. For example, the fish food at the fish farms contained a lot of sulfur supplements and the bacteria from those samples were rich in sulfur metabolism genes and the bacteria from corals contained many different respiration genes to deal with the highly variable oxygen concentrations found there. Dinsdale and her coauthors go so far as to suggest that gene function may provide a better indicator of environment than the taxonomy of the bacteria present.

The paper did have a little trouble in the math in one part but the authors already have a correction in for it so it’s really not worth worrying about. Overall, it was a pretty interesting story and a good example of stuff to do with a sequencing machine (also it must have taken a good bit of work to collect all that data together from all those authors).

References

Elizabeth A. Dinsdale, Robert A. Edwards, Dana Hall, Florent Angly, Mya Breitbart, Jennifer M. Brulc, Mike Furlan, Christelle Desnues, Matthew Haynes, Linlin Li, Lauren McDaniel, Mary Ann Moran, Karen E. Nelson, Christina Nilsson, Robert Olson, John Paul, Beltran Rodriguez Brito, Yijun Ruan, Brandon K. Swan, Rick Stevens, David L. Valentine, Rebecca Vega Thurber, Linda Wegley, Bryan A. White, Forest Rohwer (2008). Functional metagenomic profiling of nine biomes Nature, 452 (7187), 629-632 DOI: 10.1038/nature06810

Biologist
Statistician

Comments (1)

Permalink

Getting Help with SAS

SAS source code

There was some discussion in one of my SAS posts about where to find SAS help and communities. It seemed like a pretty useful topic so I thought I’d expand it a bit and make a post out of it. First, let me say I’m not the most knowledgeable since I’m more of a find-wall-bang-head type of programmer but I did my best to dig up some possible answers. If anyone has any other suggestions, feel free to leave them in the comments.

  • To start with, there’s always the official online documentation although this tends to be more for polishing something you already know how to do than starting cold.
  • Speaking of official, there’s also the official SAS forums. I didn’t know about these until I started looking around for this post so I can’t say much about them but the topics they have available seem rather specific and I can’t figure out where one would go to post a basic question.
  • Edit:There’s also the SAS Knowledge Base that has a lot of good papers and notes detailing SAS features complete with sample code and explanations. It’s really useful if you’re a learn by example type. (Thanks to Alison for pointing this one out).
  • Kelly Levoyer of SAS points out SAScommunity.org which seems like it is a little sparse but does have a surprisingly long list of SAS-related blogs.
  • The SAS company also appears to have jumped on the blogging band wagon although really only SAS Dummy looks helpful for learning SAS at the moment.
  • The only place that seem to be available for asking general question is the SAS-L email list (which I just found out is the same as the comp.soft-sys.sas Usenet group). There’s a nice paper on SAS-L etiquette (mostly do your homework first) (found via the sascommunity site).

Offline, there are also SAS user groups. I often get emails from our local one but I’ve never actually gone. The SAS company also has trainers that travel and teach quick classes. Our university stats department brought in one to teach a couple short two-day classes about statistical functions and macros. The classes were pretty good although I’m not sure how much it cost or how frequent they are. It might be worth checking on if you’re near a university.

Finally, you can also read my poor attempts at explaining SAS macro variables and SAS macros. Also, if you have any specific questions you can try asking in the comments here and if it’s not too time consuming I’ll try to lend a hand.

Programmer
SAS
Statistician

Comments (2)

Permalink

SAS Macros: Letting SAS Do the Typing

I’ve been meaning to write up a bit on using macros in SAS to complement my previous post on macro variables for quite a while. Luckily Norwegian guy reminded me about the pain of starting programming in SAS and provided me some motivation. So here’s my take on using macros in programming.

Continue Reading »

Programmer
SAS
Statistician

Comments (14)

Permalink

WP_MonsterID and Statistics

An example of a MonsterID

After making the WP_MonsterID WordPress plugin to create a random monster avatar from an assortment of parts for each commenter (based on other people’s code), fruityoaty asked This looks nifty, but how many monster images are available for assigning?

Continue Reading »

Programmer
Statistician
Web

Comments (52)

Permalink

xkcd Geek Comic Site

I’ve been running into math and programming related comics for a while now and always wondered where they were coming from. Today I finally ran across the source. From the topics, it appears the guy is some sort of computer networking mathy type. Some of them are beyond me and I’ve actually learned a bit by googling the ones I didn’t understand like the Alice and Bob one. Anyway, here’s a few samples:

Continue Reading »

Programmer
Statistician

Comments (0)

Permalink