Clustering flowgrams: error

Hello,

I ran this command:
shhh.flows(flow=Glbl.trim.flow,processors=8)

Starting to cluster the flowgrams, the program gives me the following error message:


123600 22186 22125.1
123700 22224 22162.3
123800 22261 22199.6
123800 22261 22199.6

Total time: 22261 23015.2

Clustering flowgrams…
********************###########
Reading matrix: |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


[ERROR]: std::bad_alloc has occurred in the ShhherCommand class function getOTUD
ata. Please contact Pat Schloss at mothur.bugs@gmail.com, and be sure to include
the mothur.logFile with your inquiry.

I’d appreciate it if you could help me.

Thank you,
Armin

The error you are getting indicates you are running out of memory. The shhh.flows command is very memory intensive. Here are 2 options that may help reduce the memory needed.

  1. Run trim.flows with an oligos file before you run shhh.flows. When you run trim.flows with an oligos file, mothur will separate your flows by sample which allows you to run smaller sets through shhh.flows. Trim.flows will create a .flow.files file that can be used with shhh.flows file parameter to run each sample separately and combine the results into one .shhh.fasta file for you when complete.

  2. Use 1 processor. The more processors you use the more memory is needed.

Thanks Westcott

I used 1 processor and I got the same error. And when I set the oligo file in trim.flow command almost all the flows ended up in scrap file. I checked the oligo file, and the samples have a reasonable number of reads when I run trim.seqs command with oligo file set. Maybe I should mention that in my oligo file primer+tag were used in combination as barcode and no primer was defined.

Thank you very much in advance,
a

For the flows that were scrapped, what was the scrap code?

Below is the first line of the file, which shows that barcode is missing (right?), but as I mentioned barcodes work fine in trim.seqs

720
#|lb 100 0.09 0.11 0.12 1.08 0.11 0.96 1.01 0.11 0.98 0.91 0.90 1.05 0.90 0.81 0

Thanks again

Yes, barcode and length are the reason it failed. If you want to send your logfile, flow and oligos files to mothur.bugs@gmail.com, I can try to troubleshoot it for you.

It looks like your data are bad. When I run:

trim.flows(flow=glbl.flow, oligos=oligos_2.txt, pdiffs=2, bdiffs=1, processors=1, minflows=360,maxflows=720)

among the first 10000, 8000 don’t have 360 good flows and the other 2000 don’t have the right barcode.

If you look at one sequence that went to the scrap:


G329W3U02IGRE2|lb 60 0.07 0.10 0.10 1.06 0.09 0.96 1.02 0.09 1.01 0.98 1.04 1.08 1.00 0.88 1.01 0.11 0.10 0.86 0.95 0.09 0.09 0.88 0.97 0.09 1.06 0.89 0.93 0.07 1.01 0.85 0.92 0.09 0.96 0.90 0.16 0.10 0.96 0.16 0.10 1.05 1.77 0.12 0.99 0.17 0.98 0.11 0.18 1.73 0.15 0.78 1.49 0.23 0.75 0.23 0.43 1.36 1.05 0.12 0.11 1.32 0.28 0.61 0.13 0.94 1.92 0.33 1.76 0.59 1.28 0.11 1.30 0.61 0.86 0.12 0.91 1.07 1.15 0.36 0.14 1.07 1.07 0.35 0.35 0.66 0.31 1.18 0.42 0.35 1.10 1.40 0.39 0.33 0.71 1.43 0.66 0.77 0.74 0.48 1.14 0.84 0.15 0.20 1.11 1.06 0.11 0.41 0.57 0.77 0.41 0.38 1.10 0.69 0.44 0.31 1.46 0.16 0.81 0.40 1.40 0.12 0.68 0.26 ...

If you translate this to DNA sequence, the beginning is:

GACTACGTACACACTACTACTATGTTCTGGAC [The initial GACT is the test sequence]

This isn’t close to anything in your oligos file. A bigger problem for you is that your sequences are very short and noisy. I’d figure out what’s wrong with your sequencer and go from there…

Pat

Hi again,
Thank you for your help. I figured out the problem: I had to include the linker and sequencing primer in the oligos file. As you mentioned the general quality of my reads is low. I decided to trim the flows with the setting of noise=0.5, and I got ~240,000 reads (out of ~400,000) in the output file (noise=0.7 gave only ~100,000); is this a wise thing to do?

Also I would like to get a consensus sequence for each OTU, but the function “Consensus.seqs” requires aligned sequences. The problem is that my reads are fungal ITS sequences which as you know are not alienable across the kingdom. Do you have any idea how to solve this?

I appreciate you time.
Best wishes,
a

Thank you for your help. I figured out the problem: I had to include the linker and sequencing primer in the oligos file. As you mentioned the general quality of my reads is low. I decided to trim the flows with the setting of noise=0.5, and I got ~240,000 reads (out of ~400,000) in the output file (noise=0.7 gave only ~100,000); is this a wise thing to do?

Um, probably not - but do you have mock community data? Without mock community data showing no bad effects, I wouldn’t trust it. Also, the lookup files were originally developed for 454, not IonTorrent - so that might be another hickup in the works. We’re working on creating a workup file but it might be a few weeks.

Pat