Cluster.split runtime problem

Cheny · June 2, 2015, 6:48pm

Hi, I am a new user of Mothur and following the mi-seq SOP. My data input for the cluster.split is ~1 million reads. It’s crashed after running for 3 days on 30 processors. Is there an alternative way to do it? Thank!

Kendra · June 3, 2015, 3:23am

Did you unique.seqs and pre.cluster? I have a dataset of hundreds of soil samples that still had ~1M seqs after pre.clustering but was able to generate OTUs using 1 processor on a machine with 128GB RAM in 9 days.

Cheny · June 4, 2015, 1:25pm

Yes, I used unique.seqs before pre.cluster. I have 24 samples. I guess running hierarchical clustering on a 1M by 1M matrix takes a very long time. Does it make more sense to include uclust method as an option for large data?

Kendra · June 5, 2015, 5:14pm

cluster.split will work if you have enough RAM. Like I said, ONE processor that could access 128GB ram worked for a similar sized dataset. Multiple processors means multiple amounts of RAM. My big cluster job maxed out around 90GB RAM-had I tried to use 2 processors I would have needed 180GB RAM and the computer would have hung. Before my lab got the server with serious RAM, I tried using an SSD as swap to up the virtual RAM but that didn’t work it still hung. So rerun cluster split on a single processor and as much RAM and you can get your hands on

Just curious, what are you samples that you’re getting a million pre.clustered uniques with just 24 samples? I had hundreds of soils

pschloss · June 16, 2015, 1:15pm

It’s unlikely that a 1Mx1M matrix will ever make it through. You can try cluster.split, but I still have my doubts. You probably should know about this…

http://blog.mothur.org/2014/09/11/Why-such-a-large-distance-matrix%3F/

ci.keating1 · November 24, 2015, 10:40am

Hi everyone,

I’m not sure if I’m encountering the same problem. The cluster.split command is running but it’s been hanging on the clustering part for one fasta for two days, but since it hasn’t crashed I don’t want to click out of it. But I am unsure if this is a problem or not. I ran the command using 8 processors and my machine has 64gb of ram so maybe I should try with one processor? It’s hanging on the clustering of “cod index.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta.0.dist” and when I look at this file it’s 274gb large. Should I just kill this process or does it just take a lot of time?

I did have a large dataset but after quality control, and chimera removal etc I was working with 272,514 sequences. Obviously this is still a really large number of uniques. My data is from 68 samples from the codgut based on different feeding regimes.

Thanks for your help,

Ciara

pschloss · November 24, 2015, 1:35pm

I’d let it sit. I’ve got datasets that take a week or more to run.

Pat

Topic		Replies	Views
cluster.split error mothur bugs	2	2217	March 14, 2015
cluster.split killed Commands in mothur	1	2903	August 28, 2013
cluster and cluster.split Commands in mothur	8	6865	September 18, 2013
cluster.split error mothur bugs	6	5591	February 6, 2013
cluster.split running out of RAM mothur bugs	1	3091	February 25, 2014

Cluster.split runtime problem

Related topics