Dear Mothur users and developers,
Using the classify.seqs command I noticed that the taxonomic summary file seems to be inconsistent in taxon name usage.
In the file “mysamples.final.rdp6.tax.summary†one will find several taxa within a given taxonomic level that are called “unclassifiedâ€.
E.g. at phylum level (level 2) rank ID’s 0.1.34.1 & 0.1.35.1 both have “unclassified†as taxon name. I would expect them to be named “unclassified……â€.
I think that if this is a real bug it will seriously effect donwstream taxonomy based diversity analysis.
Best wishes,
Guus
It’s not a bug, it’s a feature Don’t worry this doesn’t affect downstream analyses. When you run the phylotype command on the companion taxonomy file, mothur keeps everything straight.
Hi Patrick,
Thanks for your reply.
Yes, now I can see why this doesn’t affect alpha and beta-diversity analyses.
However, I still don’t understand the advantage of this “feature”. It makes it hard to use the rdp6.tax.summary file to generate e.g. a pie chart with the taxonomic breakdown at a given tax level. You would end up with several sections called “unclassified”. It will be a pain to classify these manually when dealing with many very species rich samples.
Why not call them “unclassified……” like the RDP Multiclassifier does?
Best,
Guus
So the problem is what to do with things that are assigned to TM7 and have an unclassified class, order, family, and genus. Our goal was to have the same number of sequences at each taxonomic level in the table. Also, it becomes difficult to put TM7;class_incertae_sedis;order_incertae_sedis;family_incertae_sedis;genus_incertae_sedis; because the rdp6 taxonomy is the only outline that actually corresponds to the Linnean taxonomy and we don’t want to make the exception the rule. Also, the series of numbers in the second column that are separated by periods actually does this already.
If you have ideas that we could apply regardless of the taxonomy outline we’d love to hear how to make the output more useful.
Thanks for the feedback…
Pat
Hi Pat,
Ok, forget that I ever mentioned a piechart :oops: “[Piecharts] have no place in the world of grownups, and occupy the same semiotic space as short pants, a runny nose, and chocolate smeared on one’s face…”
I see the problem with TM7… the RDP Multiclassifier handles this pragmatically (non-Linnean) by putting the genus “TM7_incertae_sedis” as a member of the “generic taxonomic group” unclassified_TM7 within the phylum TM7.
Still, using the rank ID to trace back the “lower” taxonomic levels for each group of unclassifieds in a bargraph can be a lot of work and I am tempted to make a script to automate this.
Best wishes,
Guus