classify.seqs across multiple domains

I have a question about implementing classify.seqs across both the archaeal and bacterial domains.

Currently the Silva taxonomy reference files are separated into both Archaea and Bacteria. If you have a 16S fasta file that contains both archaeal and bacterial sequences, how would the best way be to get the “top” classification.

I have run the 16S sequences against both the Bacteria and Archaea references separately. It would seem that the logical thing would be to choose the classification which has the higher bootstrap value. Is this the only way to due such a classification?

I tried concatenating the bacteria and archaea nogap.fasta and .tax files, but the results did not seem to support the idea that the classification with the highest bootstrap would be the assigned taxonomy. Instead, many of the sequences had poor bootstrap values at the Domain level, especially for the Archaea. Is there any way to adjust this? Is it because the library size of each Domain is different?

Thanks
Ben

Hmmm… Interesting question. I’m surprised that concatenating the databases didn’t work. It could have problems in assigning because the archaeal taxonomy is pretty weak compared to the bacterial taxonomy. Also, sometimes more information is just confusing to the classifier. You could try running classify.seqs with method=knn, search=distance, numwanted=1 to see what the % similarity is between your sequences and the concatenated database. This would require inputting aligned sequences and tends to be a bit slow (http://www.mothur.org/wiki/Classify.seqs#search.3Ddistance).

Another alternative would be to use the mothur version of the training set that the RDP uses from our website and includes Archaea (http://www.mothur.org/wiki/RDP_reference_files).