Taxonomic database implementation

Nathalia · March 29, 2012, 10:22am

Hey,

I tried classifying my sequences using classify.seqs but realized that many of them were unclassified at the species level.
I nBLASTed these sequences against nr database in NCBI and got identification results with 100% identity to specific species (mostly P. acnes).
I looked through the .taxonomy files supplied with mothur and found that there were no sequences for this species and many others.
Is there a way to implement the database with new sequences?

pschloss · April 3, 2012, 1:57pm

Sure, you’ll have to modify the taxonomy file to add species names. The RDP only supplies taxonomies to the genus level.

FM_Kerckhof · September 19, 2012, 8:41am

Patrick, would you be so kind to provide some pointers how to modify this training set to include species-level classification?
Has anyone done it before? It could save a lot of double work if one user made this modification available .

Kind regards,

FM Kerckhof

pschloss · October 2, 2012, 11:37am

Hi,

Yes, I know that people have done this. Here’s what the RDP training set might have in the taxonomy file…

J01695_S001099426 Bacteria;“Proteobacteria”;Gammaproteobacteria;“Enterobacteriales”;Enterobacteriaceae;Escherichia_Shigella;

Here's how you would change it...
J01695_S001099426 Bacteria;"Proteobacteria";Gammaproteobacteria;"Enterobacteriales";Enterobacteriaceae;Escherichia_Shigella;Escherichia_coli;

You would probably have to add sequences to both the taxonomy and fasta files so that there are multiple Escherichia_coli sequences, etc. FWIW, the greengenes database does have species-level data included for some sequences.

ajone · October 11, 2012, 12:21pm

Another way to classify your sequences to species-level is to extract sequences from the genera of interest and then do phylogenetic analysis with the inclusion of sequences of the type strains from the genus of interest. This way is more robust than the automatically classification of sequences, especially then dealing with medically important bacteria like streptococcus where species belonging to the same genus may be very closely related according to the 16S rRNA sequences.

Anders

Topic		Replies	Views
Custom Taxonomy File Commands in mothur	3	3019	April 11, 2013
Tweaking databases to include custom sequences Commands in mothur	14	13001	May 28, 2016
Taxonomy files with e.g. "Faecalibacterium_prausnitzii(90)"? Commands in mothur	4	3241	May 2, 2013
How can I classify OTUs to "SPECIES" level with mothur? Theory behind mothur	9	4810	December 12, 2019
classify.seqs using the "trainset14_032015.pds.tax" as a reference file Commands in mothur	2	1345	July 19, 2016

Taxonomic database implementation

Related topics