Cluster.split â€“ limit of sequences it can handle?

mkaestli · August 18, 2014, 12:40am

Hi Pat and Sarah,

Following the MiSeq SOP, Iâ€™m trying to cluster 2,160,000 unique sequences (total of 13 million sequences - amplicon size 350 bp) of 219 samples. Iâ€™m running the cluster.split command (taxlevel=4, cutoff=0.15, no use of “large=T”) with 64 cores and 400G RAM (of which it is using 100G).The program has been running for 12 days and Iâ€™m a bit worried it wonâ€™t finish.

Hence, my question is there a maximum of sequences cluster.split (or Mothur in general) can handle and is there something I could do to speed up the process? We plan to analyse even more samples in the future…

My apologies if this has been already answered on the forum â€“ I realise there have been a few posts on this subject but I couldn’t find one that fits the above.

Thanks!
Miriam

pschloss · August 18, 2014, 7:06pm

It will probably continue running for a very long time. you can try using taxlevel=5 or 6.

The problem is that you have 350 bp amplicons, which indicates that your reads do not fully overlap with each other and that you likely have a high error rate. This will inflate the number of unique sequences you have (as well as the number of OTUs and the distance between samples). I strongly encourage people to use the V2 chemistry and sequence the V4 region to get proper denoising of your data.

Pat

mkaestli · August 20, 2014, 12:01am

Thanks Pat!

We used the V2 and V3 chemistry and covered the V4/5 region.

I’ll try taxlevel 5 or 6. If that fails - while not ideal, would it be possible to remove rare/singleton sequences before cluster.split (being aware that not all erroneous sequences are rare and not all rare sequences are erroneous but it would reduce spurious OTUs given our likely high error rate…) e.g. by using split.abund (remove.rare?)?

Thanks!
Miriam

pschloss · August 22, 2014, 8:27pm

The v3 chemistry has been a disaster. We’re still advocating sequencing the V4 region with the v2 chemistry.

Pat

Topic		Replies	Views
cluster.split Commands in mothur	1	1428	January 16, 2015
Use cluster.split on MiSeq data Commands in mothur	15	13895	May 9, 2013
cluster and cluster.split Commands in mothur	8	6862	September 18, 2013
cluster.split Commands in mothur	13	8686	July 15, 2013
cluster.split Commands in mothur	1	2526	August 25, 2011

Cluster.split â€“ limit of sequences it can handle?

Related topics