Errors in classification strings?

I have been working with a 16S data set and was showing my taxonomy classifications to my PI when he pointed out that the string “k__Bacteria(100);p__Firmicutes(100);c__Clostridia(100);o__Clostridiales(100);f__Lachnospiraceae(100);g__Clostridium(100);s__Clostridiumscinden(100);” is incorrect. The genus Clostridium should be in the Clostridiaceae! And I found this same error multiple times in the file. Now I’m concerned about other taxonomic strings in the data set. Do you have any ideas as to why this taxonomic string is coming up? I am using the GreenGenes database for my final OTU classification step (see command strings below).

mothur v.1.25.1 command strings used for classification, with “…” for other intervening steps:

mothur > merge.files(input=nogap.archaea.fasta-nogap.bacteria.fasta-nogap.eukarya.fasta, output=nogap.all.July2012.fasta)
Output File Name: nogap.all.July2012.fasta
mothur > merge.files(input=silva.archaea.silva.tax-silva.bacteria.silva.tax-silva.eukarya.silva.tax, output=silva.all.silva.July2012.tax)
Output File Name: silva.all.silva.July2012.tax
mothur > classify.seqs(fasta=Cows2.unique.good.filter.unique.precluster.pick.fasta, template=nogap.all.July2012.fasta, name=Cows2.unique.good.filter.unique.precluster.pick.names, taxonomy=silva.all.silva.July2012.tax, cutoff=60, processors=3)

mothur > classify.seqs(fasta=Cows2.final.fasta, name=Cows2.final.names, group=Cows2.final.groups, template=gg_99.pds.ng.fasta, taxonomy=gg_99.pds.tax, cutoff=60)
mothur > classify.otu(list=Cows2.final.phylip.an.list, name=Cows2.final.names, taxonomy=Cows2.final.pds.taxonomy, group=Cows2.final.groups, label=0.05, cutoff=60, basis=otu)

Any ideas? Thank you!

-Kelsea Jewell

The genus Clostridium is a garbage can and is borderline meaningless - it is not monophyletic. Here’s what is currently in greengenes for the Clostridium genus with the number of sequences represented by each…

858 k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Clostridiaceae;g__Clostridium;
3 k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__ClostridialesFamilyXI.IncertaeSedis;g__Clostridium;
1148 k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae;g__Clostridium;
73 k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Ruminococcaceae;g__Clostridium;
93 k__Bacteria;p__Tenericutes;c__Erysipelotrichi;o__Erysipelotrichales;f__Erysipelotrichaceae;g__Clostridium;

And people wonder why I hate “names”? :slight_smile:

Thank you! Hmm, that’s frustrating, but probably unavoidable in the world of bacterial identification … would you suggest relying only on family-level identifications for purposes of sorting through the reads, or trying to identify multiple different “Clostridium” OTUs?

If you’re particularly interested in this group, then I would probably report things as family/genus combinations.

pat