I am pretty new to using mothur and am experimenting with a reduced dataset to try and set up a de-noising script.
It appears that when I run shhh.seqs it takes the same amount of time to run when specified either 1 or 8 processors. This could be an error that I am making in setting it up at my end but I thought I would check it is a feature that has been implemented as on the wiki there is no information under the “processors” heading.
If you only have one flow file to denies and if you aren’t running MPI, then there won’t be much of a speed up. The only speed up you’ll see without MPI is in the initial distance calculation, which is relatively quick compared to everything else. It turns out that a lot of the calculation in the EM step of the algorithm isn’t sped up much by multiple processors. Our recent implementations of the processor option will take the files in the flow.files file and send each to a separate processor. This should make things much faster.
Thanks for the quick reply. Does this apply to shhh.seqs as well as shhh.flows?
I mean, is there a speed improvement for doing mothurMPI run on shhh.seqs as well as shhh.flows or do you just have to wait it out?
It is the calculating distances with shhh.seqs that is the only thing taking time on my small data set (~6 minutes compared to everything else up to that point taking about 1 minute in total)
In general mpi doesn’t seem to do us much good. But both shhh.flows and shhh.seqs are parallelized in a similar manner. If you provide multiple files / groups then each group is put on a different processor. If you only provide one file / group then the data are split across the processors as much as possible.
Hi I think I have the previous problem sorted, thanks very much for the help.
Just one last query is there any way to obtain a final fasta file, split by barcode, out of the shhh.seqs command. The output when using the groups option appears to be combined fasta and names files with a map file provided for each barcode.
If I want to look at sequences belonging to each barcode separately do I need to use the map files to split the combined files myself or is there a pre-written script for this? (alternatively am I missing the point with how I am executing shhh.seqs?!)