shhh.flows gives an error

Hi,
I am trying to run shhh.flows on a dataset which consists of 30 samples, with a total of 255623 reads. The sequences are encoding a protein coding gene. To do this analysis I run it on our local cluster where I have asked for 64 Gb of memory.
I had downloaded the latest version of mothur this week and installed it on my account on our cluster.

Linux version

Using ReadLine

Running 64Bit Version

mothur v.1.27.0
Last updated: 8/8/2012

So my command is this:

mothur > shhh.flows(file=OFS_ARHD.HTN0MRN04.flow.files, lookup=LookUp_Titanium.pat)

The output that I get is this:

Using 1 processors.

Processing OFS_ARHD.HTN0MRN04.PM10-A04.Ac114f.flow (file 1 of 30) <<<<<
Reading flowgrams…
[ERROR]: std::bad_alloc has occurred in the ShhherCommand class function getFlowData. Please contact Pat Schloss at mothur.bugs@gmail.com> , and be sure to include the mothur.logFile with your inquiry.

At this point I don’t know what is happening any why it is crashing. I do know that when I tried to run it with the previous version of Mothur v1.26.0, I got the same error. I have also tried to run this interactively or in batch mode. Any idea where I should look to see why mothur is giving this ERROR?

I have also tried it with an older version of mothur 1.25.1, which previously worked to process a 16S rRNA dataset. Now it also gives an error.

mothur v.1.25.1
Last updated: 5/14/2012

by
Patrick D. Schloss

Department of Microbiology & Immunology
University of Michigan
pschloss@umich.edu
http://www.mothur.org

When using, please cite:
Schloss, P.D., et al., Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol, 2009. 75(23):7537-41.

Distributed under the GNU General Public License

Type ‘help()’ for information on the commands that are available

Type ‘quit()’ to exit program



mothur > shhh.flows(flow=OFS_ARHD.HTN0MRN04.RE-C40.Ac114f.flow)

>>>>> Processing OFS_ARHD.HTN0MRN04.RE-C40.Ac114f.flow (file 1 of 1) <<<<< Reading flowgrams... [ERROR]: std::bad_alloc has occurred in the ShhherCommand class function getFlowData. Please contact Pat Schloss at > , and be sure to include the mothur.logFile with your inquiry.

One thing that is out my control here. Is that our cluster has been updated to new hardware in the last weeks. So when I use mothur V1.25 on the 16S rRNA sequences that was on the old hardware, and now the new hardware gives a problem with v1.25 to v1.27.
This is the website with description of the new hardware: http://www.uio.no/english/services/it/research/hpc/abel/more/.

This morning I realized my mistake and how this effected the shhh.flows.

Before running shhh.flows I had trimmed my sequences with the following command:

trim.flows(flow=OFS_ARHD.HTN0MRN04.flow, oligos=OFS_ARHD.oligos, pdiffs=2, bdiffs=1, minflows=200, maxflows=900, fasta=T, processors=2)

I had thought be smart, to put the maxflows at 900, not realizing that I had sequenced with GS-FLX Titanium which only goes to 800 flows. Trim.flows neatly follows my commands and will put out flow files having a the following first lines:

900
HTN0MRN04IKSFZ 412 1.04 0.01 1.00 0.02 0.05 1.00 0.02 1.04 0.07 0.01 1.07 0.03 1.87 0.01 0.05 1.99 1.04 0.00 1.05 0.05 1.05 0.01 0.04 1.96 0.06 0.03 1.04 0.04 0.07 2.07 0.13 0.00 1.72 0.04 1.91 0.01 1.02 0.04 0.98 0.01

instead of

800
HTN0MRN04IKSFZ 412 1.04 0.01 1.00 0.02 0.05 1.00 0.02 1.04 0.07 0.01 1.07 0.03 1.87 0.01 0.05 1.99 1.04 0.00 1.05 0.05 1.05 0.01 0.04 1.96 0.06 0.03 1.04 0.04 0.07 2.07 0.13 0.00 1.72 0.04 1.91 0.01 1.02 0.04 0.98 0.01

When you then run shhh.flows it aborts with error described in the previous posting. I guess this error is due to the incompatibility of 900 flows with a titanium look up file based on 800 flows, right?

So this morning I changed my trim.flows command by setting maxflows at 800 and ran it.

trim.flows(flow=OFS_ARHD.HTN0MRN04.flow, oligos=OFS_ARHD.oligos, pdiffs=2, bdiffs=1, minflows=200, maxflows=800, fasta=T, processors=2)

Now shhh.flows is running without aborting. So I guess I have solved my own problem here.

You’re also creating problems for yourself with this approach. By default we use min/maxflows=450 for the reasons descrbied in our PLoS ONE paper. Basically, shhh.flows is virtually worthless if you use a range of flows instead of trimming your flowgrams to a common number of flows. In addition, it will run faster if you trim everything.

Pat

It’s a long time since I wrote my post here, but I thought I should fellow up on it.

After Pat his comment that I would be making things very difficult for myself using different min- and maxflows, I have checked his paper and I have played around with different or equal min- and maxflows and I have to admit that Pat is right.

Not only did my error disappear but using a min- and maxflow of equal length, saves a lot of time afterwards.

:slight_smile: Every now and then…