Hi!
First, I am not actually sure whether this is a bug or a feature, but I didn’t quite know where else to post it.
I have an ongoing study here where the first part of the study has now been published. We are processing the next half of the study, where we compare the results to the first part. However, we are running into a problem that is sort of curious.
The thing is that since I am comparing two data sets, I am now running both data sets in mothur. The first set of data were processed earlier, with version 1.14, and the first set is now being processed together with the new data set with version 1.22.2. What we see now is that we get very different OTU numbers for the first data set when clustering with version 1.22 than we got with 1.14. Now, I know that the results are comparable, since I ran the same batch script with 1.22.2 as I did with 1.14.
The script:
set.dir(output=.)
summary.seqs(fasta=…/…/newdata/grouped/v6_control.fsa)
unique.seqs(fasta=…/…/newdata/grouped/v6_control.fsa)
summary.seqs(fasta=…/…/newdata/grouped/v6_control.unique.fsa)
#first the precluster alt
pre.cluster(fasta=…/…/newdata/grouped/v6_control.unique.fsa, name=…/…/newdata/grouped/v6_control.names, diffs=2)
summary.seqs(fasta=…/…/newdata/grouped/v6_control.unique.precluster.fsa)
pairwise.seqs(fasta=…/…/newdata/grouped/v6_control.unique.precluster.fsa,calc=eachgap, countends=F)
read.dist(column=…/…/newdata/grouped/v6_control.unique.precluster.dist, name=…/…/newdata/grouped/v6_control.unique.precluster.names)
cluster(method=average)
rarefaction.single()
summary.single(calc=sobs-coverage-chao-ace-npshannon)
With 1.14 I get 1402 otus at 0.03, while I get 1435 for 1.22.2. That is not a major difference, however, what does worry me is that I for these sequences get to having everything clustered, that is, everything in one OTU at very different distances. For 1.14 I get one OTU at 0.68, while I get one OTU at 0.35 for 1.22.2. This tells me that something is going on here.
The main reason that I am asking about this is that I am going to compare already published results to new data. I can try to explain different OTU numbers to what we published before with differing software versions, but that could raise some annoying methodological questions. I had a look at the changelogs, but I didn’t notice any changes occurring to anything that I use in this script since version 1.14, so this big change is a bit puzzling for me (although I might very well have missed something here). ATM, the only way I can see that I can get things to be compatible is to reinstall version 1.14, but that is really something that I’d like to avoid.
Any thoughts on why this happens?
Thanks!
Karin