tax.summary bug ?

Dear all,
I have a stange problem with classify.seqs command.
It run well with my data and silva.slv.taxonomy, but In the two generated files myFile.silva.slv.taxonomy and myFile.silva.slv.tax.summary I found several levels of taxonomy that were not in silva.slv.taxonomy.
For exemple :
Bacteria;Firmicutes;Clostridiales;uncultured
or Bacteria;Firmicutes;Clostridiales;Johnsonella_et_rel.;Johnsonella_et_rel.;Johnsonella_et_rel.;uncultured;uncultured
are listed in the myFile.silva.slv.tax.summary with no null values but I didn’t found them in silva.slv.taxonomy.
All added levels are “uncultured” levels.

is anybody have an idea how can this be possible ?

Thank you.

Christine.

Weird, can you email us (mothur.bugs@gmail.com) the odd sequences and your mothur.*.logfile?

Hi Christine,

We have made several changes to the classier since version 1.8. We have increased the speed, made the summary file rankIDs static between runs so the summary files can be compared, added groups and name options to the summary file output, added additional error checking to make sure the template and taxonomy files match, as well as a few bug fixes. These changes seem to have solved the problem you are having. We do extend the taxonomies reported so that all sequences are classified to the same level, but since we use “unclassified”, not “uncultured”, I don’t believe that was the cause of the error you found. The extension will result in lines like:

GBED33301DCNTO Bacteria(100);Actinobacteria(91);Actinomycetaceae-Bifidobacteriaceae(91);Bifidobacteriaceae(91);Bifidobacterium_choerinum_et_rel.(91);Bifidobacterium_longum_et_rel.(90);unclassified;unclassified;unclassified;unclassified;unclassified;unclassified;unclassified;

where the taxonomy in silva.slv.taxonomy is:

Bacteria;Actinobacteria;Actinomycetaceae-Bifidobacteriaceae;Bifidobacteriaceae;Bifidobacterium_choerinum_et_rel.;Bifidobacterium_longum_et_rel.;

Thanks for taking the time to report the bug, and helping us make mothur better.

Hello
After using classify.seqs I blasted a couple of my OTUs which could be classified and some that couldn’t be classified past Bacteria level across greengenes and NCBI databases. One of the OTUs was 97% similar to Pseudomonas. A another member of the lab tried the same thing with their dataset and also found one of her OTUs was similar enough to Pseudomonas to be classified the same genera but in the summary file they couldn’t be classified past bacteria. Any suggestions why our OTUs aren’t being assigned to Pseudomonas? The other OTU sequences I blasted across greengenes and NCBI matched the classification in my taxonomy summary file.
Thanks

Could you send the sequences to mothur.bugs@gmail.com?

So here’s one of the sequences you sent us…

SeqA
CAACGCGAAGAACCTTACCAGGCCTTGACATGCAGAGAACTTTCCAGAGATGGATTGGGTGCCTTCGGGAACTCTGACACAGGTGGTGCATGGCTGTCG

Here’s what we get when we run this through the RDP website…
Root[100%] Bacteria[100%] “Proteobacteria”[97%] Gammaproteobacteria[97%] Alteromonadales[42%] Alteromonadaceae[42%] Salinimonas[32%]

When we blastn this against the nt database at NCBI you get pages of matches to “uncultured Pseudomonas clone” accessions.

Within mothur, if you use the most recent RDP taxonomy you will get…
Bacteria(100);Proteobacteria(91);Gammaproteobacteria(91);Alteromonadales(59);Alteromonadaceae(57);Aestuariibacter(51)

The greengenes taxonomy…
Bacteria(100);Proteobacteria(69);Gammaproteobacteria(68);Pseudomonadaceae(41);Unclassified(24)

The SILVA taxonomy…
Bacteria(100);Cyanobacteria(16);SubsectionIII(16);Halomicronema(16);

The NCBI taxonomy…
Bacteria(96);Proteobacteria(87);Gammaproteobacteria(85);Pseudomonadales(44);Pseudomonadaceae(43);Pseudomonas(43);

If you only use those matches where the bootstrap values are above 50%, mothur actually does better than the RDP website. Obviously there are some differences between the taxonomies, but that is why there are multiple taxonomy outlines available. As a rule, I wouldn’t put much trust into blasting sequences against the nt database at ncbi. blastn is generally pretty poor at finding high quality matches and the taxonomy is supplied by the user - I have found cases where people claim a new phylum of bacteria based on sequences that are really E. coli. Hopefully, this answers your question and gives people a sense for why it is helpful to use multiple taxonomies.

Pat