OTUs or sequences?

jsweas · August 30, 2016, 8:03am

At STAMPS this year the theme seems to be that OTUs are over and we should be calling sequences down to the nucleotide…

Rob Knight said QIIME will be doing that soon, Holmes and Meren presendted their methods to do it, and Robert Edgar talked said it’s the future and his method is coming soon.

I Tried the Holmes method. It works, and is easy to use at least to if you used R before.

Getting sequences exact seems better. Is there a mothur version of this coming? We didnt hear anything like that from Tracy but I figured its best to ask at the source.

pschloss · August 30, 2016, 12:26pm

Thanks for the question. I think that when you go to a meeting that is hosted by researchers that develop a method (e.g. oligotyping) you are going to hear a lot of good things about that method without much discussion of the problems. Meren has also said that 16S is dead, which of course it is not. Of course these types of biases would be found at a mothur workshop.

Oligotyping is something we’re definitely thinking about. The only thing I have to add at this point is that most of the methods out there are poorly validated and have been run on pretty crummy data with an error rate over 0.1% (ours is more around 0.01%). These methods would likely claim that non-random sequencing errors are separate oligotypes. For example, I suspect that if the methods were applied to a mock community you would see these oligotypes that differ by 1 nt as being unique oligotypes. With OTU-based approaches they would be absorbed. I also think that with PCR biases you would likely see “dynamics” for intragenomic variation of 16S sequences, which has frequently been used a claim to bolster support for the validity of an oligotype.

Pat

Kendra · August 30, 2016, 8:52pm

I haven’t had a chance to try DADA2 yet. They claim that they’re able to recover exactly the strains they put in and that mothur finds many spurious OTUs but they don’t seem to be doing any of the quality filtering steps? They jump from merge.contigs to chimera checking in their description? Anyone seen exactly how they processed with their comparison tools?

Kendra · August 31, 2016, 2:25pm

Ah they did have all their mothur code, I just didn’t look far enough in the file.

pcr.seqs(fasta=silva.bacteria.fasta, start=11894, end=25319, keepdots=F, processors=8)
#: make.contigs(file=m35M.files, processors=8)
#: screen.seqs(fasta=current, group=current, maxambig=0, maxlength=275, maxhomop=8)
#: unique.seqs()
#: count.seqs(name=current, group=current)
#: align.seqs(fasta=current, reference=silva.bacteria.pcr.fasta)
#: summary.seqs(fasta=m35M.trim.contigs.good.unique.align)
#: screen.seqs(fasta=current, count=current, start=1976, end=11549)
#: filter.seqs(fasta=current, vertical=T, trump=.)
#: count.seqs(name=current, group=current)
#: unique.seqs(fasta=current, count=current)
#: pre.cluster(fasta=current, count=current, diffs=2)
#: chimera.uchime(fasta=current, count=current, dereplicate=T)
#: remove.seqs(fasta=current, accnos=current)
#: summary.seqs(fasta=current, count=current)
#: dist.seqs(fasta=current, cutoff=0.20)
#: cluster(column=current, count=current)
#: make.shared(list=current, count=current, label=0.03)
#: get.oturep(column=current, count=current, list=current, fasta=current, label=0.03)

jsweas · September 2, 2016, 7:53am

Thanks for the information!

but why am I getting a lot more oTus from mothur than Dada2?

Kendra · September 2, 2016, 1:43pm

I’d bet its from the error correcting model dada2 uses. I haven’t read their paper close enough to fully understand how it works. It’s not going back to the truly raw data (the way that pyronoise error corrected), so they’re error correcting off of the fastq?

benjamin.callahan · November 23, 2016, 7:58pm

You are right on both points. It’s a bit unintuitive, since dada2 is resolving sequences exactly, but the removal of errors by dada2 almost always results in significantly fewer inferred sequence variants than the number of OTUs that are output by OTU pipelines like mothur/QIIME.

Error correction is based off the quality scores in the fastqs, combined with a statistical model of error abundances. And error-correction makes chimera removal an easier problem, which helps there as well.

Topic		Replies	Views
Are "unique seqs" analogous to ESVs? Theory behind mothur	3	872	October 20, 2019
Problem Qiime/mothur different results Integrating mothur with other programs	5	5939	August 4, 2015
Mothur vs dada2 Theory behind mothur	9	3544	June 12, 2020
Suspect to have an overall error rate of 0? Theory behind mothur	6	756	April 14, 2020
Too much OTUs Commands in mothur	1	1837	October 12, 2015

OTUs or sequences?

Related topics