shhh.seqs

nelson · June 25, 2012, 3:30pm

Hi all,

I am pretty new to using mothur and am experimenting with a reduced dataset to try and set up a de-noising script.
It appears that when I run shhh.seqs it takes the same amount of time to run when specified either 1 or 8 processors. This could be an error that I am making in setting it up at my end but I thought I would check it is a feature that has been implemented as on the wiki there is no information under the “processors” heading.

Thanks very much,

Nelson

pschloss · June 25, 2012, 4:53pm

If you only have one flow file to denies and if you aren’t running MPI, then there won’t be much of a speed up. The only speed up you’ll see without MPI is in the initial distance calculation, which is relatively quick compared to everything else. It turns out that a lot of the calculation in the EM step of the algorithm isn’t sped up much by multiple processors. Our recent implementations of the processor option will take the files in the flow.files file and send each to a separate processor. This should make things much faster.

Pat

nelson · June 27, 2012, 2:34pm

Hi,

Thanks for the quick reply. Does this apply to shhh.seqs as well as shhh.flows?
I mean, is there a speed improvement for doing mothurMPI run on shhh.seqs as well as shhh.flows or do you just have to wait it out?
It is the calculating distances with shhh.seqs that is the only thing taking time on my small data set (~6 minutes compared to everything else up to that point taking about 1 minute in total)

Thanks again,

Nelson

pschloss · June 28, 2012, 2:50pm

In general mpi doesn’t seem to do us much good. But both shhh.flows and shhh.seqs are parallelized in a similar manner. If you provide multiple files / groups then each group is put on a different processor. If you only provide one file / group then the data are split across the processors as much as possible.

Pat

nelson · July 7, 2012, 3:40pm

Hi I think I have the previous problem sorted, thanks very much for the help.
Just one last query is there any way to obtain a final fasta file, split by barcode, out of the shhh.seqs command. The output when using the groups option appears to be combined fasta and names files with a map file provided for each barcode.
If I want to look at sequences belonging to each barcode separately do I need to use the map files to split the combined files myself or is there a pre-written script for this? (alternatively am I missing the point with how I am executing shhh.seqs?!)

Thanks very much,

Nelson

pschloss · July 9, 2012, 1:53pm

You can use the get.groups command to split the file up.

nelson · July 10, 2012, 11:49am

Perfect, thank you so much for you help.

Topic		Replies	Views
Time required for shhh.flows Commands in mothur	9	7793	March 8, 2012
parallel capabilities? Commands in mothur	3	4369	August 20, 2013
MPI, multi-cores, oh my ... Commands in mothur	1	2541	October 28, 2011
shhh.flows Commands in mothur	3	3514	March 26, 2012
time to run shhh.flows Commands in mothur	5	3541	July 11, 2014

shhh.seqs

Related topics