Large datasets: out of memory

vmirly · May 19, 2015, 2:06pm

I have been trying to do clustering on a large dataset of size 30075 sequences, however, without preclustering.

My commands are as follows

summary.seqs(fasta=sequence.fasta)

align.seqs(fasta=sequence.fasta, reference=../silva.v4.fasta)

dist.seqs(fasta=sequence.align, output=lt, calc=eachgap)

cluster(phylip=sequence.phylip.dist)

It fails due to out of memory with the following error messgae:

[ERROR]: std::bad_alloc has occurred in the ClusterClassic class function getSmallCell. This error indicates your computer is running out of memory.  This is most commonly caused by trying to process a dataset too large, using multiple processors, or a file format issue. If you are running our 32bit version, your memory usage is limited to 4G.  If you have more than 4G of RAM and are running a 64bit OS, using our 64bit version may resolve your issue.  If you are using multiple processors, try running the command with processors=1, the more processors you use the more memory is required. Also, you may be able to reduce the size of your dataset by using the commands outlined in the Schloss SOP, http://www.mothur.org/wiki/Schloss_SOP. If you are uable to resolve the issue, please contact Pat Schloss at mothur.bugs@gmail.com, and be sure to include the mothur.logFile with your inquiry.

I have 8GB RAM with intel i5. If I increase the memory to 16GB or 24GB, is it going to help me?

pschloss · May 25, 2015, 2:30pm

It’s hard to say whether it will fail or not. You could try setting cutoff=0.20 in cluster.

I’m not sure what you’re doing upstream of these steps, but I think you’re making your life really hard. Things like pre.cluster, filter.seqs, screen.seqs, unique.seqs, etc. are all designed to reduce the number of unique sequences that have to be clustered to make it more RAM friendly and accelerate the process.

Pat

Topic		Replies	Views
cluster of a large dataset Commands in mothur	3	2779	July 18, 2016
[ERROR]: std::bad_alloc has occurred mothur bugs	4	1502	May 24, 2017
cluster memory error mothur bugs	4	4619	October 27, 2012
RAM issue with clustering OTUs Commands in mothur	4	646	February 6, 2021
Segmentation fault when clustering a 1.44 GB dist file mothur bugs	5	135486	November 14, 2009

Large datasets: out of memory

Related topics