I have shotgun metagenome data and I want to pull out only ribosomal sequences to study the taxonomy. I used the classify.seq command with bootstrap value 80. The mothur is still running but I can see all the sequences get unknown taxonomy. I am wondering will if it be good to lower the bootstrap value but I am afraid I will get more uncertain ribosomal sequences.
Also, if there any other way I can get the ribosomal sequences.
One idea might be to take several bona fide 16S rRNA gene sequences and blast them against your sequence collection to identify those reads with 16S in them and then process those further. The problem with your approach is that anything will classify to something if you push it hard enough.
It depends a bit what you want to do with the data you get, but there are programs designed to try and extract/rebuild full-length 16S sequences from metagenomic data. EMIRGE is one (link). I’ve found it pretty temperamental, it seemed to build a lot of chimeric sequences with my data, but YMMV.
Also, I don’t know if I’m stating the obvious here but because mothur reports every failure, but only blocks of 100 successes, it probably looks worse than it really is. For example, if you classified 1010 sequences (1000 successful, 10 failed) you would get 10 success messages (100, 200, 300, …, 1000) and 10 fail messages. Obviously you expect most of your data to fail since it’s WGS, but there are probably more successes than eye-balling the logfile would indicate.
(1) : A software tool for automated detection and discrimination among ribosomal small subunit (12S/16S/18S) sequences of archaea, bacteria, eukaryotes, mitochondria, and chloroplasts in metagenomes and environmental sequencing datasets (http://microbiology.se/software/metaxa/).
(2) [b][/b]: Alignment-free algorithm for rapid in silico detection of ribosomal gene fragments from metagenomic sequence data sets (http://metagenomics.atc.tcs.com/i-rDNA/).