Cluster split command running for days

aarpee · June 16, 2022, 4:34am

I have been trying to run 16S rRNA V4 metagenomic sequences to obtain rarefacts curves and other diversity calculations for gastric ulcer, cancer and healthy stool samples. I have 61 pairs of fastq files
I’m using a server with 128gb ram and 64 processors.
Everytime it’s reaching the cluster split step, the process is becoming extremely slow and have been running for more than a week.
Please help me out since it’s my first time using mothur.

Alexandre_Thibodeau · June 20, 2022, 7:28pm

Hello!

It normally means you have a lot of unique sequences most of the time inflated because of high error rates. But normally, only using V4, should not give you so much errors. Could you please post the commands you are using, the number of processors you are using and the last summary available?

Cheers!

pschloss · June 21, 2022, 8:02pm

As @Alexandre_Thibodeau mentioned you likely have high error rates. Looking at another of the threads you posted to you have 2x150 nt reads to sequence the 250 nt V4 region. This means that the reads do not fully overlap to denoise each other. Because you have so many uniques, everything will take much longer, likely use more RAM, and contain a lot of sequencing noise. You’ll want to check out this blog post…

Pat

system · July 1, 2022, 8:03pm

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Use cluster.split on MiSeq data Commands in mothur	15	14048	May 9, 2013
** Exceeded maximum allowed command errors, quitting ** mothur bugs	6	1427	August 10, 2020
Stuck at clustering, its running for more than a week Commands in mothur	7	417	January 20, 2024
cluster.split V4 MiSeq runtime problem Commands in mothur	7	3538	February 25, 2016
cluster.split failure Commands in mothur	1	4125	June 30, 2016

Cluster split command running for days

Related topics