multithreading in sparcc

harti77 · June 24, 2015, 7:43pm

Hi Pat and Sarah

I am running SparCC from within mothur (actually for the first time) and try to use multiple processors with “processors=20”. However, checking the activity of mothur using top, I can see that mothur only uses 1 processors (already 1 hour running).

Does SparCC support multi-threading?

Thanks guys,
Martin

pschloss · June 29, 2015, 8:23pm

It does, but not every step of the algorithm will benefit from parallelization. If it’s taking too long, then you might reconsider how you are filtering the shared file data going into sparcc.

Pat

harti77 · June 30, 2015, 6:29am

Thanks Pat.

The OTU table is comparatively small, thus it is probably more a matter of the settings. Did you evaluate the number of samplings, iterations, and permutations such that the default values of mothur should be fine? I think in particular the number of permutations should be higher than 1000 in case I want to correct for multiple testing (e.g. with FDR). However, I struggle to distinguish between the sampling and iteration parameters when comparing to the paper.

Any further guidance is greatly appreciated.
Thanks,
Martin

pschloss · July 9, 2015, 12:48pm

Argh, sorry, it’s been a long time since I looked at the code and I’m drawing a blank. I’m pretty confident that the default settings were what we used in the Marino et al. PNAS paper. I do recall that having OTUs without many reads really bogs down the algorithm, which is why we created the filter.shared command.

Pat

dwaite · July 9, 2015, 8:19pm

Just on the subject of speed, there was this paper published in Frontiers in Microbiology last year that looked at some of the common correlation coefficients and statistics from these kind of analyses.

Apart from advocating for the SparCC approach over methods like Spearman/Pearson they provided a semi-objective metric for stripping out rare OTUs before performing these analyses. Working in SparCC I noticed a real boost in speed using their method (several-day computations take a few hours, or less) so it might be worth giving it a go if you’re having trouble.

Also - I can’t speak for the mothur implementation, but the actual SparCC program is restricted to 1 processor. The only point you can get multithreading is when doing the permutations for your p-value calcuations (basically, create X random shuffles of the table and then divide them up over your number of processes).

Topic		Replies	Views
sparCC command parameters Commands in mothur	1	904	May 23, 2017
SparCC settings Commands in mothur	2	2184	April 25, 2016
Producing correlation matrix in mothur Commands in mothur	2	15	December 2, 2024
Can Mothur run with multiple CPU? Feature requests	7	8814	April 21, 2010
Processors used Theory behind mothur	7	10114	July 11, 2014

multithreading in sparcc

Related topics