Managing Non-standardized Taxonomy Levels in Classify Output

Hello All,

What a glorious day for microbial ecology! :ugeek:

To the point, I’d like to play around with my mothur output in R, but after importing my .taxonomy file I realized that the delimitation of taxonomic levels is not standard. The fact that many lineages have sub-class, sub-order (etc.) means that the family and genus names end up spread non-uniformly across the columns. Clearly the problem originates with the formatting of the Silva SSU database, but I thought I’d post here b/c of how active this forum is.

SO!

Question 1: Has anyone already managed to tackle this issue? If so, I welcome your advice.

Question 2: How can we get Silva to put out a database that has a better format for managing the discrepancies in lineage? I noticed that the UNITE database (http://unite.ut.ee/) for fungal ITS regions incorporates pre-fixes which differentiate each taxonomic level (see below). This comes in real handy when trying to standardize in R. Any ideas of who to approach?

Thanks in advance!

Roli

IIKFCBR02FT934 k__Fungi(100);p__Zygomycota(92);c__Zygomycota_class_Incertae_sedis(92);o__Mortierellales(92);f__Mortierellaceae(92);g__Mortierella(92);s__Mortierella_fimbricystis(89);
IIKFCBR02IJZVU k__Fungi(100);p__Ascomycota(89);c__Sordariomycetes(71);unclassified;unclassified;unclassified;unclassified;
IIKFCBR02FVFH2 k__Fungi(100);p__Ascomycota(100);c__Sordariomycetes(100);o__unclassified_Sordariomycetes(100);f__unclassified_Sordariomycetes(100);g__unclassified_Sordariomycetes(100);s__uncultured_Sordariomycetes(100);
IIKFCBR02IPMDN k__Fungi(100);unclassified;unclassified;unclassified;unclassified;unclassified;unclassified;
IIKFCBR02GD4OF k__Fungi(100);p__Ascomycota(91);c__Sordariomycetes(81);unclassified;unclassified;unclassified;unclassified;
IIKFCBR02G0T2A k__Fungi(100);p__Basidiomycota(100);c__Agaricomycetes(100);o__Agaricales(100);f__Entolomataceae(100);g__Entoloma(100);s__Entoloma_sp(69);
IIKFCBR02JJTP7 k__Fungi(100);p__Ascomycota(95);c__Sordariomycetes(71);unclassified;unclassified;unclassified;unclassified;

The greengenes databases have prefixed taxonomy classifications similar to your UNITE example.

I beleive the mothur authors have suggested the current greengenes database is also more taxonomically complete than the SILVA databases (though worse for alignment). In most of my current work I’m using SILVA for alignment and greengenes for classification.