Finding .taxonomy files for Classify.seqs?

azle694 · December 2, 2009, 9:31pm

Does anyone know how the SILVA .taxonomy file was obtained or created? I would like to look at using RDP or greengenes 16s aligned databases as templates for the classify.seqs function, but I haven’t been able to find the corresponding .taxonomy files.

pschloss · December 3, 2009, 1:30pm

Hi there azle694,
If you could give me a few days, I’ll post the analogous files for greengenes, ncbi, and rdp. Basically, what I did was to extract the NDS information corresponding to the silva taxonomy information from the SILVA-provided ARB database. I then selected only those sequences that were in the reference alignment (http://www.mothur.org/wiki/Alignment_database; note: this is my attempt to reconstruct the SILVA SEED database). Finally, I converted any spaces in the taxonomy strings to underscores - “_” and added a semicolon - “;” to the end of each line.

This database has just over 14,000 sequences in it. I think anyone should be able to construct a similar database for any gene family or to use a more complete database using this type of approach. The nice thing about the ARB database is that it has the fields for the different taxonomies. I think the greengenes dataset does as well. A big unresolved question is how big the database should be. Have fun exploring!

Pat

wern0122 · May 10, 2010, 5:34pm

Hi - I have a similar question about your SILVA taxonomy. I’d like to directly compare classification results from different methods, so it would be very helpful if they’re on the same hierarchy. Is your RDP hierarchy mapping for the SILVA seed database based on the RDP Training Set 4, or the newer (early 2010) Training Set 5 or 6? (I hear the new RDP hierarchy reorganized some of Firmicutes and cyanobacteria, and I believe this is the default now on their website.) Thank you very much for your time and your extremely useful software!

pschloss · May 11, 2010, 10:10am

Our RDP taxonomy outline is what the SILVA folks pulled out of the RDP. The actual RDP training set is available, but we opted for the SILVA because we were able to make it more comprehensive.

wern0122 · May 11, 2010, 2:53pm

Thanks for the fast reply! So, the silva.rdp.taxonomy mapping file is just the results of searching the SILVA subset against the RDP classifier? I’m interested in trying a number of different 16S classification databases, but I’d like to have them all mapped to the same hierarchy if possible (for comparison). But, perhaps this is a misinformed effort on my part?

pschloss · May 11, 2010, 6:02pm

Not exactly - I got the RDP taxonomy outline from the appropriate field in the SILVA-provided arb database. They also provide the greengenes, NCBI/EMBL, and SILVA taxonomies for each sequence

wern0122 · May 11, 2010, 6:17pm

Thanks Pat! I appreciate your helpfulness. I’ll just try both rdp.taxonomy and slv.taxonomy, and if neither gives me an overabundance of “Bacteria;unclassified_Bacteria” I’ll run with the RDP hierarchy. Thanks again.

CDU · June 29, 2010, 6:50am

Hi Are there taxonomy files for silva archaea and eukaryote fasta files which can be used with the classify.seqs command?

pschloss · June 29, 2010, 11:26am

Yes - please see http://www.mothur.org/wiki/Silva_reference_files. I just posted these and would appreciate any feedback people have about the completeness of these references.

echoly · August 14, 2012, 3:57am

Hi Pschloss, Are there a silva reference files based on the silva104? And other question, I want to improve the classify of my Archaea data, cause I get lots of unclassified ones now when I use the mothur applied silva template, so do you have any advises? I download the silva 104, but I do’t know how can I use it.

ctdoc · September 4, 2013, 12:46am

Hello-
I am wondering if I can align to the SILVA database and classify with greengenes and if the classify.seqs command is the way to do this. I used the SILVA database earlier on when going through the 454 SOP but I’m wondering when I enter the classify.seqs command, do I enter in the gg_99 .fasta and .tax as the template and taxonomy files? This seems right to me…ie. I’ve already aligned to SILVA at this point and I am just taking my data and classifying it against the Greengenes database…right? Just making sure. When aligning the RDP training set 9, I have loads of unclassified sequences. I think more will be identified using Greengenes. Any suggestions? and thanks!

Topic		Replies	Views
RDP, SILVA, GreenGenes Journal club	5	18769	October 1, 2013
classifying seqs, rdp vs silva Theory behind mothur	3	4292	November 9, 2015
Silva/RDP databases. Is it up to date? Commands in mothur	5	4356	July 15, 2014
Which reference to use for classify.seqs? Commands in mothur	2	1356	March 14, 2016
Classify seqs Theory behind mothur	3	5169	September 10, 2014

Finding .taxonomy files for Classify.seqs?

Related topics