OTU classification in taxonomy file and RDP classification of rep sequence don't agree

I used mothur to screen, align and cluster near full length 16S rRNA gene sequences from Sanger sequencing into OTUs (commands used are below). My contigs.good.unique.opti_mcc.0.03.cons.taxonomy shows OTU02 as Limnobacter with confidence of 95%. If I generate representative sequences for each OTU using the get.oturep command and then run these sequences through RDP’s classifier, this OTU is classified as a Rhizobacter with confidence of 100%. Rhizobacter is the best hit in the NCBI database as well. This disagreement with the taxonomy file happens for a few of the OTUs. Can you please help me understand why this is happening?


summary.seqs(fasta=contigs.fasta) screen.seqs(fasta=current, group=group_cut.txt, maxambig=14, maxlength=1400) summary.seqs(fasta=contigs.good.fasta) unique.seqs(fasta=contigs.good.fasta) align.seqs(fasta=contigs.good.unique.fasta, reference=silva.nr_v123.align, processors=2) dist.seqs(fasta=contigs.good.unique.align, cutoff=0.20) cluster(column=contigs.good.unique.dist, name=contigs.good.names) summary.single(calc=sobs, label=0.03) make.shared(list=contigs.good.unique.opti_mcc.list, group=group_cut.good.txt, label=0.03) count.groups(shared=contigs.good.unique.opti_mcc.shared) sub.sample(shared=contigs.good.unique.opti_mcc.shared, size=26) classify.seqs(fasta=contigs.good.unique.fasta, name=contigs.good.names, reference=trainset9_032012.pds.fasta, taxonomy=trainset9_032012.pds.tax, cutoff=80) classify.otu(list=contigs.good.unique.opti_mcc.list, taxonomy=contigs.good.unique.pds.wang.taxonomy, label=0.03) rarefaction.single(shared=contigs.good.unique.opti_mcc.shared, calc=sobs, freq=2) get.oturep(column=contigs.good.unique.dist, name=contigs.good.names, fasta=contigs.good.fasta, list=contigs.good.unique.opti_mcc.list)

Can you post/email the sequences that are in OTU02 along with the representative sequence?

Pat, I wonder if this is similar to the issue I emailed to mothurbugs last night? seeming mismatch between shared file and rep.fasta?

I just emailed the sequences to the mothur admin.

If you can send the full contigs.fasta and group_cut.txt to mothur.bugs@gmail.com we can take a look.

When I look at the 19 sequences you sent me, I’m not getting them to cluster into a single OTU. The are two OTUs - 1 with a single Rhizobacter sequence and 1 wiht the 19 limnobacter sequence. The distance between the two sequences is about 0.15 - there’s no way these should be in the same OTU. Something else I see is that the sequences vary significantly in their length. The Rhizobacter sequence is 990 nt long and the others are >1200 nt long. This doesn’t seem to be causing the problem, but at the same time you really need to filter your sequences to overlap the same alignment coordinates.

Like I said, email the two files to mothur.bugs and we can take a deeper look.
Pat


PS. with the new opticlust algorithm, you only need to run dist.seqs to a cutoff of 0.03, not 0.20

OK, thanks. I just emailed the fasta and group files to mothur.bugs

I am not able to reproduce the error. I ran the following:

set.dir(input=…/…/otuerror, outputdir=…/…/otuerror)
summary.seqs(fasta=contigs.fasta)
screen.seqs(fasta=current, group=group_cut.txt, maxambig=14, maxlength=1400)
summary.seqs(fasta=contigs.good.fasta)
unique.seqs(fasta=contigs.good.fasta)
classify.seqs(reference=trainset9_032012.pds.fasta, taxonomy=trainset9_032012.pds.tax, cutoff=80)
align.seqs(fasta=contigs.good.unique.fasta, reference=silva.nr_v123.align, processors=2, flip=t)
dist.seqs(fasta=contigs.good.unique.align, cutoff=0.20)
cluster(column=current, name=current)
make.shared()
count.groups()
classify.otu(list=current, taxonomy=current, name=current, label=0.03)
get.oturep(column=current, name=current, fasta=current, list=current)
classify.seqs(fasta=contigs.good.unique.opti_mcc.0.03.rep.fasta, name=contigs.good.unique.opti_mcc.0.03.rep.names, reference=trainset9_032012.pds.fasta, taxonomy=trainset9_032012.pds.tax, cutoff=80)

The results of the classify.otu command are, contigs.good.unique.opti_mcc.0.03.cons.taxonomy file:

OTU Size Taxonomy
Otu01 144 Bacteria(100);"Proteobacteria"(100);Betaproteobacteria(100);Burkholderiales(100);Burkholderiaceae(100);Limnobacter(100);
Otu02 10 Bacteria(100);"Proteobacteria"(100);Betaproteobacteria(100);Burkholderiales(100);Comamonadaceae(100);Hydrogenophaga(100);
Otu03 9 Bacteria(100);"Proteobacteria"(100);Betaproteobacteria(100);Burkholderiales(100);Comamonadaceae(100);Comamonadaceae_unclassified(100);
Otu04 9 Bacteria(100);"Proteobacteria"(100);Gammaproteobacteria(100);Pseudomonadales(100);Pseudomonadaceae(100);Pseudomonas(100);
Otu05 4 Bacteria(100);"Proteobacteria"(100);Alphaproteobacteria(100);Sphingomonadales(100);Sphingomonadaceae(100);Sphingomonadaceae_unclassified(100);
Otu06 2 Bacteria(100);"Proteobacteria"(100);Alphaproteobacteria(100);Sphingomonadales(100);Sphingomonadaceae(100);Sphingopyxis(100);
Otu07 2 Bacteria(100);"Proteobacteria"(100);Alphaproteobacteria(100);Caulobacterales(100);Caulobacteraceae(100);Brevundimonas(100);
Otu08 1 Bacteria(100);Firmicutes(100);Bacilli(100);Bacillales(100);Paenibacillaceae_1(100);Paenibacillus(100);
Otu09 1 Bacteria(100);"Proteobacteria"(100);Alphaproteobacteria(100);Sphingomonadales(100);Erythrobacteraceae(100);Porphyrobacter(100);
Otu10 1 Bacteria(100);"Proteobacteria"(100);Alphaproteobacteria(100);Rhizobiales(100);Rhizobiaceae(100);Rhizobium(100);
Otu11 1 Bacteria(100);"Proteobacteria"(100);Gammaproteobacteria(100);Pseudomonadales(100);Pseudomonadaceae(100);Pseudomonas(100);
Otu12 1 Bacteria(100);"Actinobacteria"(100);Actinobacteria(100);Actinomycetales(100);Mycobacteriaceae(100);Mycobacterium(100);
Otu13 1 Bacteria(100);"Proteobacteria"(100);Alphaproteobacteria(100);Sphingomonadales(100);Sphingomonadaceae(100);Sphingomonas(100);
Otu14 1 Bacteria(100);"Proteobacteria"(100);Alphaproteobacteria(100);Sphingomonadales(100);Sphingomonadaceae(100);Sphingomonadaceae_unclassified(100);
Otu15 1 Bacteria(100);"Proteobacteria"(100);Alphaproteobacteria(100);Rhizobiales(100);Hyphomicrobiaceae(100);Devosia(100);
Otu16 1 Bacteria(100);"Proteobacteria"(100);Gammaproteobacteria(100);Pseudomonadales(100);Pseudomonadaceae(100);Rhizobacter(100);
Otu17 1 Bacteria(100);"Proteobacteria"(100);Alphaproteobacteria(100);Rhizobiales(100);Bradyrhizobiaceae(100);Afipia(100);
Otu18 1 Bacteria(100);"Proteobacteria"(100);Alphaproteobacteria(100);Rhizobiales(100);Bradyrhizobiaceae(100);Bosea(100);
Otu19 1 Bacteria(100);"Bacteroidetes"(100);"Sphingobacteria"(100);"Sphingobacteriales"(100);Chitinophagaceae(100);Sediminibacterium(100);

The classification of the representative sequences are, contigs.good.unique.opti_mcc.0.03.rep.pds.wang.taxonomy:

2654762 Bacteria(100);"Proteobacteria"(100);Betaproteobacteria(100);Burkholderiales(100);Burkholderiaceae(100);Limnobacter(100);
2654749 Bacteria(100);"Proteobacteria"(100);Betaproteobacteria(100);Burkholderiales(100);Comamonadaceae(100);Hydrogenophaga(100);
2650222 Bacteria(100);"Proteobacteria"(100);Betaproteobacteria(100);Burkholderiales(100);Comamonadaceae(100);Comamonadaceae_unclassified(100);
2647982 Bacteria(100);"Proteobacteria"(100);Gammaproteobacteria(100);Pseudomonadales(100);Pseudomonadaceae(100);Pseudomonas(100);
2647979 Bacteria(100);"Proteobacteria"(100);Alphaproteobacteria(100);Sphingomonadales(100);Sphingomonadaceae(100);Sphingomonadaceae_unclassified(100);
2650231 Bacteria(100);"Proteobacteria"(100);Alphaproteobacteria(100);Sphingomonadales(100);Sphingomonadaceae(100);Sphingopyxis(100);
2647985 Bacteria(100);"Proteobacteria"(100);Alphaproteobacteria(100);Caulobacterales(100);Caulobacteraceae(100);Brevundimonas(100);
2650215 Bacteria(100);Firmicutes(100);Bacilli(100);Bacillales(100);Paenibacillaceae_1(100);Paenibacillus(100);
2654735 Bacteria(100);"Proteobacteria"(100);Alphaproteobacteria(100);Sphingomonadales(100);Erythrobacteraceae(100);Porphyrobacter(100);
2647987 Bacteria(100);"Proteobacteria"(100);Alphaproteobacteria(99);Rhizobiales(99);Rhizobiaceae(99);Rhizobium(99);
2654824 Bacteria(100);"Proteobacteria"(100);Gammaproteobacteria(100);Pseudomonadales(100);Pseudomonadaceae(100);Pseudomonas(100);
2654825 Bacteria(100);"Actinobacteria"(100);Actinobacteria(100);Actinomycetales(100);Mycobacteriaceae(100);Mycobacterium(100);
2650195 Bacteria(100);"Proteobacteria"(100);Alphaproteobacteria(100);Sphingomonadales(100);Sphingomonadaceae(100);Sphingomonas(100);
2650194 Bacteria(100);"Proteobacteria"(100);Alphaproteobacteria(100);Sphingomonadales(100);Sphingomonadaceae(100);Sphingomonadaceae_unclassified(100);
2650191 Bacteria(100);"Proteobacteria"(100);Alphaproteobacteria(100);Rhizobiales(100);Hyphomicrobiaceae(100);Devosia(100);
2650190 Bacteria(100);"Proteobacteria"(100);Gammaproteobacteria(100);Pseudomonadales(100);Pseudomonadaceae(100);Rhizobacter(100);
2650188 Bacteria(100);"Proteobacteria"(100);Alphaproteobacteria(100);Rhizobiales(100);Bradyrhizobiaceae(100);Afipia(100);
2647989 Bacteria(100);"Proteobacteria"(100);Alphaproteobacteria(100);Rhizobiales(100);Bradyrhizobiaceae(100);Bosea(100);
2654740 Bacteria(100);"Bacteroidetes"(100);"Sphingobacteria"(100);"Sphingobacteriales"(100);Chitinophagaceae(100);Sediminibacterium(100);

Can you try downloading our latest version and see if you get the same results?

I repeated my analysis with flip=t in align.seqs and then I filtered the sequences to overlap the same alignment region. Now the OTU classifications in my taxonomy file and RDP classifications for representative sequences match well. Thank you for your help!

Hello everyone,

I have a related question: I created a customized training set to classify my short read sequences; one of my OTUs is classified only to the Family level, but unclassified at the genus/species level. However, my customized training set contains a sequence (~1000bp) that shares 100% identity with the representative sequence of this same OTU (~200bp). I have also added specific taxonomy to the training set, but the consensus taxonomy given by Mothur only agrees with my taxonomy down to the Family level, in this case.
I am wondering if this is a bug or something that I am doing wrong. Any advice would be greatly appreciated!