metagenomic rRNA data

Hello,

I have a fasta file which contains the rRNA-related sequences from my metagenomic data. The fasta file is generated from MG-RAST. According the MG-RAST website, it describes the fasta file as “Sequences are pre-screened using qiime-uclust for at least 70% identity to ribosomal sequences from the following RNA databases (Greengenes, LSU, SSU, and RDP).”

I also did the 16S rRNA barcode sequencing for this samples. It is easy to use the barcode sequencing data to generate taxa plot and rarefaction curve in mothur. I just want to know if I can use the rRNA-related sequences retrieved from my metagenomic data to do the same analysis. Then, I can compare the metagenomic rRNA results with 16S RNA results
Here are my questions:
1> The metagenomic rRNA dataset might have LSU and SSU in it, but no barcodes (the whole fasta file from one sample). Where should I start? All chimeras are removed. I am not sure if it is aligned or not. Will mothur support LSU database? I want to build 97% cut off otu table first.

2>If I can build an otu table, what the next command I should use to build a rarefaction curve. I won’t need to calculate alpha diversity. I just want a rarefaction curve based on observed species. (e.g. X axis is the number of sequences and Y is the number of observed OTUs)

3>If mothur can’t do this, any software can extract all rRNA related sequences from metagenomic data and analyze it.

Thanks,
Ben

1> The metagenomic rRNA dataset might have LSU and SSU in it, but no barcodes (the whole fasta file from one sample). Where should I start? All chimeras are removed. I am not sure if it is aligned or not. Will mothur support LSU database? I want to build 97% cut off otu table first.

2>If I can build an otu table, what the next command I should use to build a rarefaction curve. I won’t need to calculate alpha diversity. I just want a rarefaction curve based on observed species. (e.g. X axis is the number of sequences and Y is the number of observed OTUs)

3>If mothur can’t do this, any software can extract all rRNA related sequences from metagenomic data and analyze it.

Yep, mothur can do all of this. I would take a look at the SOPs and pick up at the align.seqs step.

However… why bother? Let me play the devil’s advocate. There are at least two big problems with analyzing metagenomes using an OTU-based approach with SSU/LSU sequences extracted from metagenomes. First, metagenomes are good at getting functional potential of a community, not taxonomy. For $50 you can get a parallel rRNA data and save your self $5000 worth of headaches and the information will be far more valuable. Frankly, as a reviewer if I get a manuscript that has a shotgun sequence set but no 16S I balk. Second, the shotgun reads will not all start and stop at the same place. Furthermore, the number of rRNA genes that you will get will be very small (about 1 in 1000) combined with making sure that they overlap the same region as is required by an OTU-based approach this seems like a fools errand. Now I get that people want to see where the biases are in PCR. Just remember that there are PCR steps in shotgun library construction. If this is the question you are interested in, then I would run all the rRNA genes through a classifier and compare the classification of the rRNA shotgun fragments and the amplicons. The comparison would need to be done at a very broad level since the short shotgun reads won’t classify very well and different regions of the rRNA gene will classify with varying efficiency. You can certainly build your own taxonomic databases and do the classifications within mothur.

Sorry if this seems like I am being pedantic or ranting, but there is a sense out there that shotgun metagenomics can solve all ills in microbial ecology. It’s great for some things and amplicon libraries are great for some things. They just aren’t the same things!

Pat

Hi Pat,

Thank you very much! I read some paper published recently and most of authors extracted the 16SrRNA from metagenomic data and analyzed it. Then, they compared those data with their 16SrRNA barcode sequences data. I agree with you. They might want to see if there are any PCR biases in it. However, it is hard to tell since the number of rRNA genes from metagenmoic data that they got very small.

I think they just show off, because most of them got the same taxa distribution from 16S rRNA barcode sequencing and metagenomic rRNA. :mrgreen:

Thanks again,
Pet