shhh.flows

Hi,

I’m just wondering how shhh.flows runs/allocates the task when running multiple processors. For example if I had 50 flow files and processors = 8, does it send 8 flow files to one processor each to be denoised or do multiple processors work on each denoising task? The reason I ask is that I’m trying to work out the limiting factor in completing a job. For example, if I had 50 processors to denoise 50 flows would it take the same time as 100 processors to denoise 100 flows, assuming the flow that would take the longest time to denoise was in the smaller sample set as well (the 50 flows), that is is the speed of the job limited by the length of time it takes to run the slowest denoise? It is pretty evident that some flows are denoised quickly, while others take considerable time. hopefully that makes sense, I’m not sure after re-reading it…

Secondly, the output from shhh.flows inclused the fasta, names etc. and while it is running, some temp.fasta and temp.names files, that have the same size as the .fasta etc. files. So am I right in assuming that the .fasta files are the complete files for that particular flow, and the temp files are the files which will be concatenated to make the final “complete” fasta etc. files? The reason I ask is that with large jobs the cluster I use sometimes goes down before completion and I figured if the above is correct I couldn’t have to re-run the complete job, but could remove the files completed from the flow.files, and run shhh.flows on the remainder, and manually concatenate the .fasta etc. at the end. Is this correct?


Thanks,

Andrew

I’m just wondering how shhh.flows runs/allocates the task when running multiple processors.

In the most recent version of mothur if you have 50 flow files and say 8 processors, it will send one file to each processor. We found that the initial distance calculations is very fast compared to the expectation-maximization step. In our hands using the MPI and other versions, it seems that the EM step doesn’t really benefit a lot from parallelization. So we decided to put different files on different processors so that you could run multiple files through the EM step at the same time - make sense? One problem with this that we’ve discovered is that the clustering step after the distance calculation is a RAM hog (surprise!) and so trying to cluster 8 flow files at the same time may crash the command. Of course, if you have different sized flow files, they will take varying amounts of time and RAM. If you have 8 processors available and 50 flow files, it might actually be fastest and most likely to complete without crashing if you set processors to 2 or 3 and don’t do anything else on that computer/node that will suck up RAM. I guess the short answer is to play with it a bit :).

On your second question, I think you’re right. The fasta and name files are written as each flow file is finished.

Pat

Hi Pat, I’m kind of in a good situation, I have access to a facility that allows me to run shhh.flows over as many processors as I like (conceptually at least, but let’s say on lot’s of processors, given I’m not the only user!) without having to use the mpi version; that is I can set processors = to = the number of .flows I have. So I’m trying to get this running more efficiently in terms of using not over staying my welcome on this facility and have a couple more questions.

  1. When I choose processors = # of .flow files, each flow is sent to separate processor, some jobs completing in seconds, some in hours, some in days. So the total run time is (more or less) the run time of the .flow that takes the longest time to de-noise. From your reply above I’m continuing to assume this is correct?
    2.So from my testing using processors = 128 mothur will denoise 128 flows with 3k reads per flow in around 45 hours (using the quince defaults for trim flows). The problem with this is that lots of the flows are denoised very quickly, but some aren’t, which means that lots of processors are potentially sitting idle, but allocated to my job, while they wait for the longest task to complete. Do you guys have any feeling for the amount of flows that take longer than say a few hours to denoise and if you do how many processors it is best to allocate if that number can be = the number of flows? That is, what do you think the trade -offs in time to run the batch shh.flows would be if I ran them 64 processors, or 32? I’m guessing it’ll just be a matter of me testing, but thought you guys (or some one else might have done this)
  2. Is there any way to estimate the time of the denoising process for each file (e.g., does the size of the .flow give any clues here?). I can’t see any correlations with size etc., but you guys might have worked something out here?
  3. How does mothur send the .flows out to be denoised? Is it done in the order they appear in the .flow.files mapping file? It’d be great if there was some way to tell which would take the longest and make sure they are queued first (if you have no of processors < # of files), it seems like it’d be good to have the file that will take the longest denoised first, for example.

Anyway, any advice for those of us using larger facilities and how to do this more efficiently would be appreciated.

Thanks,

Andrew

  1. When I choose processors = # of .flow files, each flow is sent to separate processor, some jobs completing in seconds, some in hours, some in days. So the total run time is (more or less) the run time of the .flow that takes the longest time to de-noise. From your reply above I’m continuing to assume this is correct?

Yup.

2.So from my testing using processors = 128 mothur will denoise 128 flows with 3k reads per flow in around 45 hours (using the quince defaults for trim flows). The problem with this is that lots of the flows are denoised very quickly, but some aren’t, which means that lots of processors are potentially sitting idle, but allocated to my job, while they wait for the longest task to complete. Do you guys have any feeling for the amount of flows that take longer than say a few hours to denoise and if you do how many processors it is best to allocate if that number can be = the number of flows? That is, what do you think the trade -offs in time to run the batch shh.flows would be if I ran them 64 processors, or 32? I’m guessing it’ll just be a matter of me testing, but thought you guys (or some one else might have done this)

Yeah, it’ll be on a case by case basis. My experience has been that there is more time spent swapping data between processors than is actually spent processing so the parallelization doesn’t help a whole lot in the end.


  1. Is there any way to estimate the time of the denoising process for each file (e.g., does the size of the .flow give any clues here?). I can’t see any correlations with size etc., but you guys might have worked something out here?

It’s probably something like double the size and the time goes up 4 or 8 fold. Not very precise, sorry!

  1. How does mothur send the .flows out to be denoised? Is it done in the order they appear in the .flow.files mapping file? It’d be great if there was some way to tell which would take the longest and make sure they are queued first (if you have no of processors < # of files), it seems like it’d be good to have the file that will take the longest denoised first, for example.

Pretty sure it’s in the order they appear in the flow.files file. Also, there’s nothing to say that you have to use that file. You could run everything one at a time.

Anyway, any advice for those of us using larger facilities and how to do this more efficiently would be appreciated.

Yeah, you could follow the SOP! :slight_smile: Seriously, the method assumes that the errors are evenly distributed across the flowgram. We showed in the PLoS ONE paper that they pick up after 450 flows and that at that point the sequencer is basically generating biodiversity. The overall algorithm is to calculate distances between flow grams, cluster those distances, and then move flow grams around until the optimal configuration is achieved. When you have a lot of artificial biodiversity then you’ll have a lot more clusters to optimize. This really slows things down. Also, as we showed, the error rate doesn’t get reduced. I think you’ll be a lot happier using minflows/maxflows=450.