Understanding performance of chimera.uchime

adamc83 · October 15, 2013, 8:10pm

Chimera.uchime doesn’t seem to fully take advantage of multiple processes on my system – for some reason it seems to start the correct number of processes, then gradually use less and less processes, eventually decaying to the point where it is processing samples sequentially.

Here’s what that looks like, plotting the CPU usage over time:

I’m processing ~75 samples, with processors=32, using the fastest EC2 instance @ Amazon (cc2.8xlarge, Dual 8 core Xeons w/ Hyperthreading). I logged CPU usage with sar, and nothing else was running on the system. Unfortunately, understanding mothur’s code that handles this is a bit beyond me, but it appears there might be an easy performance win for large analyses by reworking how uchime processes are started. The times plotted are real, it takes over 4 hours to chimera check the project I’m working on (soil is pretty crazy), but it looks to me like it doesnt need to.

Id be happy to provide more information, if useful.

pschloss · October 17, 2013, 6:20pm

First off, there are many ways to parallelize code and what we perceive to be the fastest isn’t always possible because of how the algorithms are designed. If you had one sample and 100 processors, it would take just as long to run uchime in de novo mode as if you had 1 processor because it can’t be parallelized on its own. The parallelization that we developed puts each sample on a different processor. So take your case of 75 samples and 32 cores. 21 cores would each get 2 samples and 11 would each get 3 samples. Each core then processes their 2 or 3 samples. When each core finishes, the core ends. If all of your samples were the same, then 21 cores would end together and 50% later the other 11 cores would end. But because our samples aren’t identical, this doesn’t happen. So what you see is that the cores with the small samples end first and you get a stair step that roughly corresponds to 3% or 1/32.

Make sense?
Pat

Topic		Replies	Views
chimera.uchime() and multiple processors Feature requests	3	4954	November 21, 2013
problem with chimera for multiple samples Commands in mothur	5	1919	April 26, 2016
CHimera uchime Commands in mothur	1	1816	July 19, 2013
Uchime using 2 processors Commands in mothur	3	551	April 23, 2020
chimera.uchime, Just got confused mothur bugs	1	3043	September 24, 2012

Understanding performance of chimera.uchime

Related topics