Cluster split command running for days

I have been trying to run 16S rRNA V4 metagenomic sequences to obtain rarefacts curves and other diversity calculations for gastric ulcer, cancer and healthy stool samples. I have 61 pairs of fastq files
I’m using a server with 128gb ram and 64 processors.
Everytime it’s reaching the cluster split step, the process is becoming extremely slow and have been running for more than a week.
Please help me out since it’s my first time using mothur.

Hello!

It normally means you have a lot of unique sequences most of the time inflated because of high error rates. But normally, only using V4, should not give you so much errors. Could you please post the commands you are using, the number of processors you are using and the last summary available?

Cheers!

As @Alexandre_Thibodeau mentioned you likely have high error rates. Looking at another of the threads you posted to you have 2x150 nt reads to sequence the 250 nt V4 region. This means that the reads do not fully overlap to denoise each other. Because you have so many uniques, everything will take much longer, likely use more RAM, and contain a lot of sequencing noise. You’ll want to check out this blog post…

Pat

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.