shhh.flows gets stuck at reading flowgrams

Hi,

I’m denoising two 454 data sets with shhh.flows. The first one went through without problems and I’ve trimmed the second one but when I start the shhh.flows command it gets stuck at ‘reading flowgrams’ of the first file (36 files in total). I’ve tried to run some of the files individually but the same thing happens. I can still run shhh.flows on the files from the first 454 set, so it seems to be an issue with the files of the second run, I did not get any error messages from the trim.flows though and I’ve previously analysed the whole set in Qiime without problems. They contain more or less the same amount of sequences. When I start shhh.flows on the second set I use quite a bit more memory on the server compared to the first set.

Has anyone encountered something similar or have suggestions for what might be the problem and solution?

Thanks!

Sandra

How much memory does your computer have and how many sequences are in each of the individual flow files? Also, could you post the exact command you are trying to run?

Hi,
Thanks for replying.
I’m running the command shhh.flows(file=…/data/G7T8VG302.flow.files, processors=8), I’m running it on a server with 92348116k free memory. There’s on average 6000 seqs pr. sample -with one sample being heavily overrepresented w 50.000 seqs. But the individual smaller files also gets stuck.

Prior to this I ran trim.flows(flow=…/data/G7T8VG302.flow, oligos=…/data/oligosmothur2.txt, pdiffs=2, bdiffs=1, minflows=360, maxflows=720, processors=2).

I can’t see what I could have done differently from the first run which worked fine.

Thank you, Sandra

It seems like the flow files are the problem, is there a way to check them? I’ve tried re-running the trim.flows but it didn’t help.

Sorry, I’m very new to this!

Sandra

No problems… Can you try running shhh.flows from within …/data?

I will give that a go tomorrow, my data dir is in another dir because of space issues so while I can easily get files from there when in Mothur I’ll need a bit of help from my colleague to get the path ‘backwards’ to mothur while standing in the data dir, if this makes sense…
Will let you know how that looks!

Yes, that works!

Thank you for a great program and excellent forum, probably not the last time I will need help…

Thanks! Sandra

No - thank you - you found a “real” bug!

Hi. I’m having the same problem. The command shhh.flows(flow=GCVP7QP01.T03_CE01.flow) gets stuck when reading the flowgram files. It keeps using more and more memory (up to 96 Gb), and eventually returns “[ERROR]: std:bad_alloc has occurred in the ShhherCommand class function getFlowData.” I’m running v.1.24.1 on 64 bit Windows. Any idea what the issue is? Thanks.

  • Georg

Is the first line of your flow file 450?

Hi,

I’m also having a problems with shhh.flows. I have 6 sff-files, each with only one barcode. 4 of those run fine, but with 2 files mothur crashes with error [ERROR]: std::bad_alloc has occurred in the ShhherCommand class function initPyroCluster. I’ve tried with my own mac, with a computer server and a computer cluster. With increase in the available memory the error changed to ShhherCommand class function driver, but it still crashes in the same spot, after reading matrix. The problem files are the biggest ones, so it may well be memory problem, as the server and cluster have some limitations for the users. But I don’t know exactly what could be the problem. With the cluster I get ShhherCommand class function driver and I can use:


core file size (blocks, -c) 0 data seg size (kbytes, -d) 8388608 scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 268287 max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) 8388608 open files (-n) 8192 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 10240 cpu time (seconds, -t) 3600 max user processes (-u) 128 virtual memory (kbytes, -v) 8388608 file locks (-x) unlimited
and with server (error ShhherCommand class function initPyroCluster)
core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 8271266 max locked memory (kbytes, -l) 256 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes (-u) 256 virtual memory (kbytes, -v) 500000000 file locks (-x) unlimited
Those could be increased if I only new what it is that is too low. Weird thing is that the number of sequences is not even that high, there is a total of ~ 120 000 seqs/file.

Please, help me!!!

Does anyone have a resolution to this? I am also having this same problem, shhh.flows failing on Reading flowgrams…

[ERROR]: std::bad_alloc has occurred in the ShhherCommand class function getFlowData.

And yes, my flow files have 450 on the first line.
I am currently using version 1.24.1. I have previously processed flow files of similar size using shhh.flows, so I don’t think memory is the issue, but I can’t remember if I was using an older version of mothur or not…

Thanks,

Vera.

How many sequences are in the flow files that are crashing? How many sequences are in the ones that aren’t crashing?

argh, I’m having the same problem again?!

in relation to another of my many inquires I’m trying to get an impression of why the seqs discarded in the trim and denoising steps were binned on request by a reviewer… But unfortunately my one shhh.scrap.fasta file is empty so I wanted to rerun the denoising…

Standing in my data directory with the flow file I run

shhh.flows(file=G7T8VG302cp.flow.files, processors=8)

And nothing happens except the memory used increasing.

Any suggestions?
I’m fairly sure most of my seqs are discarded because of low quality/length but is there an easier way to show this?

Thanks again again…

Hi there,

of seqs for flow files that processed correctly: 1852, 1291, 8822, 13802, 2252, 14497, 2572, 1266, 2188, 2209, 1146, 1643, 2448, 1220, 2045, 1708, 3117, 1645, 2364, 2389, 1082, 1359, 3415, 1646, 3734, 2418, 2600, 3690, 2933, 3093

of seqs for flow files that are causing error: 2209,1146,1643, 2448,1220,2045, 1708,3117, 1645, 2364,1852, 2389,1082,1291, 8822, 13802,2252,14497, 2572,1266,1288,1359,3415,1646,3734, 2418, 2600,3690, 2933,3093

The size of the flow files are not that different, so I don’t think that is the issue. And I am using the same number of processors as before. eg:
shhh.flows(file=allB.flow.files, processors=4)

I have just tried running shhh.flows using only 1 processor, and that also caused the same error.

But now I am completely mystified. I went back to run shhh.flows on the files that worked before to make sure they are still working, and they started up fine. Then I decided to run shhh.flows again on the files that are not working to check to make sure all the processes get started…and now the processes are running fine! I checked the logfile and I swear I have been using the exact same command!! And I have been the only one using the server. Not sure what happened here?!

Thanks,
Vera.

hello all,

OK, I shouldn’t have sworn! I went back and checked my commands, and the previous posts in this thread, and here is the problem:

This command causes the error:
shhh.flows(flow=TaiE111222.flow.files, processors=4)

But NOT this:
shhh.flows(file=TaiE111222.flow.files, processors=4)

it’s file=, not flow=

ack…glad I solved that mystery! so sorry for all the fuss!

-Vera.

…sorry for reposting but realized I got in the middle of Vera’s post.

argh, I’m having the same problem again with shhh.flows getting stuck?!

in relation to another of my many inquires I’m trying to get an impression of why the seqs discarded in the trim and denoising steps were binned on request by a reviewer… But unfortunately my one shhh.scrap.fasta file is empty so I wanted to rerun the denoising…

Standing in my data directory with the flow file I run

shhh.flows(file=G7T8VG302cp.flow.files, processors=8)

And nothing happens except the memory used increasing. The flow.files look like this:

G7T8VG302cp.Ae.150C.v341F.flow
G7T8VG302cp.Ae.150M.v341F.flow
G7T8VG302cp.Ae.153C.v341F.flow
G7T8VG302cp.Ae.153M.v341F.flow
G7T8VG302cp.Ae.160C.v341F.flow
G7T8VG302cp.Ae.160M.v341F.flow
G7T8VG302cp.Ae.220M.v341F.flow
G7T8VG302cp.Ae.263C.v341F.flow
G7T8VG302cp.Ae.263M.v341F.flow
G7T8VG302cp.Ae.280C.v341F.flow

Any suggestions?
I’m fairly sure most of my seqs are discarded because of low quality/length but is there an easier way to show this?

Thanks again again…

Hm it seems like I have a problem with trim.flows actually.
The input file looks like this

Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 39 39 0 2 1
2.5%-tile: 1 54 54 0 4 11156
25%-tile: 1 234 234 0 5 111551
Median: 1 429 429 0 5 223102
75%-tile: 1 474 474 0 6 334652
97.5%-tile: 1 482 482 1 7 435047
Maximum: 1 1190 1190 26 31 446202
Mean: 1 350.223 350.223 0.0310442 5.29249

of Seqs: 446202

But after running trim.flows

trim.flows(flow=G7T8VG302cp.flow, oligos=oligosmothur2.txt, pdiffs=2, bdiffs=1, minflows=360, maxflows=720, processors=2)

I end up with 56147 flows in G7T8VG302cp.scrap.flow and 39735 flows G7T8VG302cp.trim.flow. What has happened to the rest?..

My oligos file look like this

forward CCTAYGGGRBGCASCAG v341F
#reverse GGACTACNNGGGTATCTAAT v806R
barcode ACGAGTGCGT Ae.150C
barcode ACGCTCGACA Ae.150M
barcode AGACGCACTC Ae.153C
barcode AGCACTGTAG Ae.153M
barcode ATCAGACACG Ae.160C
barcode ATATCGCGAG Ae.160M
barcode CGTGTCTCTA Ae.220M
barcode CTCGCGTGTC Ae.263C
barcode TAGTATCAGC Ae.263M
barcode TCTCTATGCG Ae.280C

And the resulting flow files will not run through shhh.flows.

Anybody has any suggestions?

Thanks! Sandra

http://mothur.ltcmp.net/t/solved-groups-missing-in-sequence-and-name-files/965/1

Are you using 1.26.0?

no, 1.23.1. Could that be an issue? I’ve worked on two .sff files and for the other one the sum of the scrapped and passed files add up to the initial number.