shhh.flows memory requirements

Hi all!

I have trouble running shhh.flows on two flow files with a total of 768077 reads. This is how the command always finishes:

    .
    .
138400  13030   12996.9
138500  13067   13034.4
138596  13101   13068.4

Total time: 13127       12825.1

Clustering flowgrams...
********************#****#****#****#****#****#****#****#****#****#****#
Reading matrix:     |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
***********************************************************************
Killed

I’m assuming this is because there not enough RAM to read the matrix? The workstation I’m running this on has 24 GB, is that not enough? This is in mothur v.1.23.1.

Cheers,

Till

Hey Till,

Yeah, I’m pretty sure you’ll have problems with this :). Generally, people will use barcodes so that the 770k sequences are distributed across 100-200 samples. Then each sample (3000-8000 sequences each) would then be run through shhh.flows separately. To do this you need to first run trim.flows and then shhh.flows. We outline this in the SOP (http://www.mothur.org/wiki/Schloss_SOP). The other advantage is that trim.flows will remove the low quality flows. The memory approximately scales as the number of sequences squared. So if you double the number of sequences it will likely require four times the amount of RAM, etc. I’m not sure how much RAM would be required to process 770k sequences, but I suspect it’s a lot more than any of us have.

Pat

Duh, I actually did run trim.flows, of course, but them mixed up the files. I now remember why I did, too, which leads to another question:
I have two sff files, which get extracted into two flow files. trim.flows however only takes on inputfile, as far as I can see. So I end up with each sample/barcode in two flow/fasta/qual files. Does shhh.flows take two files of the same sample/barcode, or at what point would I merge them in mothur?
For now I just pasted them together before running trim.flows using cat, we’ll see if that actually works, as this run is only split across 16 samples.

Hi Till,

I’m assuming that the same barcodes were used on both runs. So I would do this… trim.flows x2, shhh.flows x2, trim.seqs x2 -> merge your trim.fasta, trim.names, and groups files -> proceed as normal.

Pat

Hi,
I am using Mothur 1.22.2 and running shhh.flows to my <5000 sequence flow.files . But the indication is ‘segmentation fault’. What’s the problem?

Hi Pat,

I also have another question now that I tried running it ‘properly’. I have split and trimmed one sff file into my single samples, generating flow files and a flow.files file with a list of the former.
However, when I runs shhh.flows with the flow.files as input, it fills up all the 24 GB of memory and then starts with the swap. I then kill mothur, as that takes forever.
If I run the shhh.flows command manyally on the single flow files that are listed in the flow.files file one by one, everything works nicely, and the ‘reading flow’ stage takes less than a second. You mentioned that shhh.flows works on one file after the other if given a list in the flow.files file. Does it actually, or does it try to load all flow files first, and then works on the following steps?

Cheers,

Till