multithreading in sparcc

Hi Pat and Sarah

I am running SparCC from within mothur (actually for the first time) and try to use multiple processors with “processors=20”. However, checking the activity of mothur using top, I can see that mothur only uses 1 processors (already 1 hour running).

Does SparCC support multi-threading?

Thanks guys,
Martin

It does, but not every step of the algorithm will benefit from parallelization. If it’s taking too long, then you might reconsider how you are filtering the shared file data going into sparcc.

Pat

Thanks Pat.

The OTU table is comparatively small, thus it is probably more a matter of the settings. Did you evaluate the number of samplings, iterations, and permutations such that the default values of mothur should be fine? I think in particular the number of permutations should be higher than 1000 in case I want to correct for multiple testing (e.g. with FDR). However, I struggle to distinguish between the sampling and iteration parameters when comparing to the paper.

Any further guidance is greatly appreciated.
Thanks,
Martin

Argh, sorry, it’s been a long time since I looked at the code and I’m drawing a blank. I’m pretty confident that the default settings were what we used in the Marino et al. PNAS paper. I do recall that having OTUs without many reads really bogs down the algorithm, which is why we created the filter.shared command.

Pat

Just on the subject of speed, there was this paper published in Frontiers in Microbiology last year that looked at some of the common correlation coefficients and statistics from these kind of analyses.

Apart from advocating for the SparCC approach over methods like Spearman/Pearson they provided a semi-objective metric for stripping out rare OTUs before performing these analyses. Working in SparCC I noticed a real boost in speed using their method (several-day computations take a few hours, or less) so it might be worth giving it a go if you’re having trouble.

Also - I can’t speak for the mothur implementation, but the actual SparCC program is restricted to 1 processor. The only point you can get multithreading is when doing the permutations for your p-value calcuations (basically, create X random shuffles of the table and then divide them up over your number of processes).