OTUs or sequences?

At STAMPS this year the theme seems to be that OTUs are over and we should be calling sequences down to the nucleotide…

Rob Knight said QIIME will be doing that soon, Holmes and Meren presendted their methods to do it, and Robert Edgar talked said it’s the future and his method is coming soon.

I Tried the Holmes method. It works, and is easy to use at least to if you used R before.

Getting sequences exact seems better. Is there a mothur version of this coming? We didnt hear anything like that from Tracy but I figured its best to ask at the source.

Thanks for the question. I think that when you go to a meeting that is hosted by researchers that develop a method (e.g. oligotyping) you are going to hear a lot of good things about that method without much discussion of the problems. Meren has also said that 16S is dead, which of course it is not. Of course these types of biases would be found at a mothur workshop.

Oligotyping is something we’re definitely thinking about. The only thing I have to add at this point is that most of the methods out there are poorly validated and have been run on pretty crummy data with an error rate over 0.1% (ours is more around 0.01%). These methods would likely claim that non-random sequencing errors are separate oligotypes. For example, I suspect that if the methods were applied to a mock community you would see these oligotypes that differ by 1 nt as being unique oligotypes. With OTU-based approaches they would be absorbed. I also think that with PCR biases you would likely see “dynamics” for intragenomic variation of 16S sequences, which has frequently been used a claim to bolster support for the validity of an oligotype.


I haven’t had a chance to try DADA2 yet. They claim that they’re able to recover exactly the strains they put in and that mothur finds many spurious OTUs but they don’t seem to be doing any of the quality filtering steps? They jump from merge.contigs to chimera checking in their description? Anyone seen exactly how they processed with their comparison tools?

Ah they did have all their mothur code, I just didn’t look far enough in the file.

pcr.seqs(fasta=silva.bacteria.fasta, start=11894, end=25319, keepdots=F, processors=8)
#: make.contigs(file=m35M.files, processors=8)
#: screen.seqs(fasta=current, group=current, maxambig=0, maxlength=275, maxhomop=8)
#: unique.seqs()
#: count.seqs(name=current, group=current)
#: align.seqs(fasta=current, reference=silva.bacteria.pcr.fasta)
#: summary.seqs(fasta=m35M.trim.contigs.good.unique.align)
#: screen.seqs(fasta=current, count=current, start=1976, end=11549)
#: filter.seqs(fasta=current, vertical=T, trump=.)
#: count.seqs(name=current, group=current)
#: unique.seqs(fasta=current, count=current)
#: pre.cluster(fasta=current, count=current, diffs=2)
#: chimera.uchime(fasta=current, count=current, dereplicate=T)
#: remove.seqs(fasta=current, accnos=current)
#: summary.seqs(fasta=current, count=current)
#: dist.seqs(fasta=current, cutoff=0.20)
#: cluster(column=current, count=current)
#: make.shared(list=current, count=current, label=0.03)
#: get.oturep(column=current, count=current, list=current, fasta=current, label=0.03)

Thanks for the information!

but why am I getting a lot more oTus from mothur than Dada2?

I’d bet its from the error correcting model dada2 uses. I haven’t read their paper close enough to fully understand how it works. It’s not going back to the truly raw data (the way that pyronoise error corrected), so they’re error correcting off of the fastq?

You are right on both points. It’s a bit unintuitive, since dada2 is resolving sequences exactly, but the removal of errors by dada2 almost always results in significantly fewer inferred sequence variants than the number of OTUs that are output by OTU pipelines like mothur/QIIME.

Error correction is based off the quality scores in the fastqs, combined with a statistical model of error abundances. And error-correction makes chimera removal an easier problem, which helps there as well.