I am running mothur’s implementation of SparCC on 135 OTUs and it has been running for a week with the default settings in mothur.
It looks like the mothur implementation of sparCC has these defaults:
Samplings = 20
Iterations = 10
Permutations = 1000
The sparCC documentation has these two parameters with the following defaults:
Iterations = 20
Simulations = 100 (then you run the bootstraps command with the same number as the number of simulations)
I would like to run sparCC outside of mothur with the same parameters as those used in the mothur implementation for comparison, but I am not sure what values to use for the number of iterations and simulations.
Looking back at our code and their code, our use of samplings is the same as their iterations and our use of permutations is the same as their simulations. A problem with the original code is that it was written in Python, which is very slow compared to C++. You could reduce the number of permutations to 100. Alternatively, we find that it is critical to filter the data as they did in the original paper to only work with the more abundant sequences.
I agree that SparCC in mothur is much faster than SparCC.py, faster more than 10 times.
But, I really get quite different results with these two SparCC on the same data.
Usually, SparCC in mothur produced more correlation relation. I have set same parameters in SparCC.py and mothur SparCC.
How can I distinguish between them ?