Different numbers of OTUs from version 1.14 and 1.22.2

Hi!

First, I am not actually sure whether this is a bug or a feature, but I didn’t quite know where else to post it.

I have an ongoing study here where the first part of the study has now been published. We are processing the next half of the study, where we compare the results to the first part. However, we are running into a problem that is sort of curious.

The thing is that since I am comparing two data sets, I am now running both data sets in mothur. The first set of data were processed earlier, with version 1.14, and the first set is now being processed together with the new data set with version 1.22.2. What we see now is that we get very different OTU numbers for the first data set when clustering with version 1.22 than we got with 1.14. Now, I know that the results are comparable, since I ran the same batch script with 1.22.2 as I did with 1.14.

The script:
set.dir(output=.)
summary.seqs(fasta=…/…/newdata/grouped/v6_control.fsa)
unique.seqs(fasta=…/…/newdata/grouped/v6_control.fsa)
summary.seqs(fasta=…/…/newdata/grouped/v6_control.unique.fsa)
#first the precluster alt
pre.cluster(fasta=…/…/newdata/grouped/v6_control.unique.fsa, name=…/…/newdata/grouped/v6_control.names, diffs=2)
summary.seqs(fasta=…/…/newdata/grouped/v6_control.unique.precluster.fsa)
pairwise.seqs(fasta=…/…/newdata/grouped/v6_control.unique.precluster.fsa,calc=eachgap, countends=F)
read.dist(column=…/…/newdata/grouped/v6_control.unique.precluster.dist, name=…/…/newdata/grouped/v6_control.unique.precluster.names)
cluster(method=average)
rarefaction.single()
summary.single(calc=sobs-coverage-chao-ace-npshannon)

With 1.14 I get 1402 otus at 0.03, while I get 1435 for 1.22.2. That is not a major difference, however, what does worry me is that I for these sequences get to having everything clustered, that is, everything in one OTU at very different distances. For 1.14 I get one OTU at 0.68, while I get one OTU at 0.35 for 1.22.2. This tells me that something is going on here.

The main reason that I am asking about this is that I am going to compare already published results to new data. I can try to explain different OTU numbers to what we published before with differing software versions, but that could raise some annoying methodological questions. I had a look at the changelogs, but I didn’t notice any changes occurring to anything that I use in this script since version 1.14, so this big change is a bit puzzling for me (although I might very well have missed something here). ATM, the only way I can see that I can get things to be compatible is to reinstall version 1.14, but that is really something that I’d like to avoid.

Any thoughts on why this happens?

Thanks!

Karin

Hey Karin,

A couple of things…

  1. In general, what you see for differences in # of OTUs seems reasonable. If you run 1.14 or 1.22 multiple times, you’d also likely get slightly different results.
  2. You can’t/shouldn’t run pre.cluster with unaligned sequences as you have done. pre.cluster assumes that the sequences are aligned and the next version of mothur will send out a warning message to this effect. Running pre.cluster with diffs=2 is also problematic if you are interested in a cutoff of 0.03 since it would have lumped sequences together that were potentially 4 bp apart (or 4/60~6.7% different).
  3. It turns out that we did make a modification in the pairwise.seqs code for v.1.18 (and had inadvertently missed in the release notes) to solve a bug that people were seeing. I’m very sorry that this slipped through and didn’t get in the release notes; we just added it. I can’t say how much it affects the actual OTU assignments. For your case (V6) I don’t think it matters much since it tended to make sequences look more different from each other than they really were. With a ~60 bp fragment, a 0.03 cutoff is only 1 or 2 differences between sequences. I actually think that your pre.cluster settings would have masked the errors in pairwise.seqs.

Let me know if we can answer any other questions…
Pat