Errors with classify.otu and silva taxonomy for Archaea

Hello -

I have encountered a strange error using the Silva taxonomy that I can’t figure out. My sample has both archaea and bacterial sequences in it so I had to use RDP in order to tease them apart.

First, I classified the sequences using the RDP taxonomy :
mothur > classify.seqs(fasta=CSarch.unique.good.filter.unique.precluster.pick.fasta, template=trainset6_032010.rdp.fasta, taxonomy=trainset6_032010.rdp.tax, processors=2, iters=1000, cutoff=80, method=bayesian)

Next, I removed the sequences classified as bacteria:
mothur > remove.lineage(fasta=CSarch.unique.good.filter.unique.precluster.pick.fasta, taxonomy=CSarch.unique.good.filter.unique.precluster.pick.rdp.taxonomy, name=CSarch.unique.good.filter.unique.precluster.pick.names, group=CSarch.good.pick.groups, taxon=Bacteria)

Then, I renamed the remaining files to reflect their new status as only archaea sequences:
mothur > system(cp CSarch.unique.good.filter.unique.precluster.pick.pick.fasta CSarch1.final.fasta)
mothur > system(cp CSarch.unique.good.filter.unique.precluster.pick.rdp.pick.taxonomy CSarch1.final.taxonomy)
mothur > system(cp CSarch.unique.good.filter.unique.precluster.pick.pick.names CSarch1.final.names)
mothur > system(cp CSarch.good.pick.pick.groups CSarch1.final.groups)

However, because the RDP taxonomy has so few Archaea sequences it does not do a very good job of classifying my sequences. There were a ton of unclassified. The Silva taxonomy has a larger database of Archaea sequences so I decided to reclassify the sequences using Silva (which I couldn’t do in the first place since the Silva database is separated into Bacteria and Archaea).
mothur > classify.seqs(fasta=CSarch1.final.fasta, template=silva.archaea.fasta, taxonomy=silva.archaea.silva.tax, processors=2, iters=1000, method=bayesian, cutoff=80)

It all seems well and good but when I make distance matrices and cluster the data into OTUs I run into problems with the classify.otu command:
mothur > classify.otu(taxonomy=CSarch1.final.taxonomy, name=CSarch1.final.names, list=CSarch1.final.an.list, basis=sequence, group=CSarch1.final.groups, label=unique-0.01-0.02-0.03-0.10, cutoff=80, reftaxonomy=silva.archaea.silva.tax)
unique 7567
Warning: cannot find taxon Soil_Crenarchaeotic_Group in reference taxonomy tree at level 2 for G40ZT5B03HBHTT. This may cause totals of daughter levels not to add up in summary file.
Warning: cannot find taxon Soil_Crenarchaeotic_Group in reference taxonomy tree at level 2 for G40ZT5B03F60UP. This may cause totals of daughter levels not to add up in summary file.
0.01 3496
Warning: cannot find taxon Soil_Crenarchaeotic_Group in reference taxonomy tree at level 2 for G40ZT5B03HBHTT. This may cause totals of daughter levels not to add up in summary file.
Warning: cannot find taxon Soil_Crenarchaeotic_Group in reference taxonomy tree at level 2 for G40ZT5B03F60UP. This may cause totals of daughter levels not to add up in summary file.
0.02 985
Warning: cannot find taxon Soil_Crenarchaeotic_Group in reference taxonomy tree at level 2 for G40ZT5B03HBHTT. This may cause totals of daughter levels not to add up in summary file.
Warning: cannot find taxon Soil_Crenarchaeotic_Group in reference taxonomy tree at level 2 for G40ZT5B03F60UP. This may cause totals of daughter levels not to add up in summary file.
0.03 317
Warning: cannot find taxon Soil_Crenarchaeotic_Group in reference taxonomy tree at level 2 for G40ZT5B03HBHTT. This may cause totals of daughter levels not to add up in summary file.
Warning: cannot find taxon Soil_Crenarchaeotic_Group in reference taxonomy tree at level 2 for G40ZT5B03F60UP. This may cause totals of daughter levels not to add up in summary file.
Your file does not include the label 0.10. I will use 0.04.
0.04 176
Warning: cannot find taxon Soil_Crenarchaeotic_Group in reference taxonomy tree at level 2 for G40ZT5B03HBHTT. This may cause totals of daughter levels not to add up in summary file.
Warning: cannot find taxon Soil_Crenarchaeotic_Group in reference taxonomy tree at level 2 for G40ZT5B03F60UP. This may cause totals of daughter levels not to add up in summary file.

Am I using the wrong silva taxonomy file? I used the same taxonomy=silva.archaea.silva.tax file for both the classify.seqs and classify.otu command - how can I be running into this issue? What is the difference between the silva.archaea.rdp.tax file and the corresponding silva.tax file?

Thanks,
Kristina

Hi Kristina,

classify.seqs(fasta=CSarch1.final.fasta, template=silva.archaea.fasta, taxonomy=silva.archaea.silva.tax, processors=2, iters=1000, method=bayesian, cutoff=80) will create a taxonomy file called CSarch1.final.silva.taxonomy not CSarch1.final.taxonomy.

You are running the classify.otu command with the taxonomy file that was classified using trainset6_032010.rdp.tax so your reference taxonomy is wrong.

You want:

classify.otu(taxonomy=CSarch1.final.taxonomy, name=CSarch1.final.names, list=CSarch1.final.an.list, basis=sequence, group=CSarch1.final.groups, label=unique-0.01-0.02-0.03-0.10, cutoff=80, reftaxonomy=trainset6_032010.rdp.tax)

or

classify.otu(taxonomy=CSarch1.final.silva.taxonomy, name=CSarch1.final.names, list=CSarch1.final.an.list, basis=sequence, group=CSarch1.final.groups, label=unique-0.01-0.02-0.03-0.10, cutoff=80, reftaxonomy=silva.archaea.silva.tax)

I hope this helps,
Sarah

Sarah-

Thanks for the suggestion but it appears that it didn’t work:

mothur > classify.otu(taxonomy=CSarch1.final.silva.taxonomy, name=CSarch1.final.names, list=CSarch1.final.an.list, basis=sequence, group=CSarch1.final.groups, label=unique-0.01-0.02-0.03-0.10, cutoff=80, reftaxonomy=silva.archaea.silva.tax)

unique 7567
Warning: cannot find taxon Soil_Crenarchaeotic_Group in reference taxonomy tree at level 2 for G40ZT5B03HBHTT. This may cause totals of daughter levels not to add up in summary file.
Warning: cannot find taxon Soil_Crenarchaeotic_Group in reference taxonomy tree at level 2 for G40ZT5B03F60UP. This may cause totals of daughter levels not to add up in summary file.
0.01 3496
Warning: cannot find taxon Soil_Crenarchaeotic_Group in reference taxonomy tree at level 2 for G40ZT5B03HBHTT. This may cause totals of daughter levels not to add up in summary file.
Warning: cannot find taxon Soil_Crenarchaeotic_Group in reference taxonomy tree at level 2 for G40ZT5B03F60UP. This may cause totals of daughter levels not to add up in summary file.
0.02 985
Warning: cannot find taxon Soil_Crenarchaeotic_Group in reference taxonomy tree at level 2 for G40ZT5B03HBHTT. This may cause totals of daughter levels not to add up in summary file.
Warning: cannot find taxon Soil_Crenarchaeotic_Group in reference taxonomy tree at level 2 for G40ZT5B03F60UP. This may cause totals of daughter levels not to add up in summary file.
0.03 317
Warning: cannot find taxon Soil_Crenarchaeotic_Group in reference taxonomy tree at level 2 for G40ZT5B03HBHTT. This may cause totals of daughter levels not to add up in summary file.
Warning: cannot find taxon Soil_Crenarchaeotic_Group in reference taxonomy tree at level 2 for G40ZT5B03F60UP. This may cause totals of daughter levels not to add up in summary file.
Your file does not include the label 0.10. I will use 0.04.
0.04 176
Warning: cannot find taxon Soil_Crenarchaeotic_Group in reference taxonomy tree at level 2 for G40ZT5B03HBHTT. This may cause totals of daughter levels not to add up in summary file.
Warning: cannot find taxon Soil_Crenarchaeotic_Group in reference taxonomy tree at level 2 for G40ZT5B03F60UP. This may cause totals of daughter levels not to add up in summary file.

I also tried using the RDP ref taxonomy and also got an error. Is it possible that the error stems from the remove.lineage command? It is also possible that there is an inherent problem with the silva.taxonomy reference.

"Warning: cannot find taxon Soil_Crenarchaeotic_Group in reference taxonomy tree at level 2 for G40ZT5B03F60UP. This may cause totals of daughter levels not to add up in summary file. "
This is searching for Soil_Crenarchaeotic_Group at the wrong level in the reference taxonomy - this should be level 3.

Here is an example of a line from silva.archaea.silva.tax:
FJ784259.1 Archaea;Crenarchaeota;Soil_Crenarchaeotic_Group(SCG); <-As you can see the SCG is at level 3.

Could this be the problem?

Kristina

Hi Kristina,
Ahh… :oops: When mothur sees a (), it thinks confidence score and trims the (SCG). We have posted a new reference taxonomy without the () characters. You can download them at, http://www.mothur.org/wiki/Silva_reference_files. Thanks for bringing this issue to our attention!
-Sarah