Cluster.split does not complete clustering

While doing my analysis I ran the cluster.split command but it did not complete the clustering process. Given below are details
In the picture below the distances were calculated for each group

The calculation of distances were completed like this as below and clustering process was started as follows

This clustering process did not complete. It just stopped here as given below and nothing happens after this

It’s likely still running. I suspect you likely have a big distance matrix that it is trying to process.


1 Like

Yes It looks like its still running and i left it like that for a whole day and it showed no progress. I repeated this but It happened again like that. So what i did now is as follows
I first generated a summary of the files that are going to be used later

Then I ran the cluster.split command as follows (this time i used “cluster=f” parameter.)

This time it calculated the distances for each group and i got the following files (few outputfiles are shown in screenshot not all)

After this I ran the following command (The clustering process started)

It has stopped here at “pick.86.dist.temp”

I had already tried the cluster.split command without using the cluster=f parameter and in that process when clustering process started, the same thing happened before ( as posted). The software did not stopped working or anything, its just that at this certain step nothing happens later. I want to generate an OTU table. Please suggest me what should i do next.

You need to wait longer. If it sits like this it is still processing. My recollection is that you have a ton of sequences. It’s going to take time.

1 Like

Sir, I will wait for it but Isn’t there any alternative option for this? At this step I have the following two files:

The “final.dist” is 101.84GB. The file splitting step has been completed using mothur > cluster.split(fasta=final.fasta, count=final.count_table, taxonomy=fial.taxonomy, taxlevel=4, cluster=f, processors=8). I am trying to cluster the sequences into OTUs by using the following command
mothur > cluster.split(file=final.file, processors=4)
[WARNING]: When using the file option, it is recommended you include the name or count file. Doing so will ensure the OTUs are printed by OTU size reflecting the redundant reads, instead of just the unique reads.

The clustering step is not completing i.e. a final mcc.list file is not there. Is it because of this warning?

No the warning shouldn’t be relevant to this. I suspect you have noisy data that is causing things to take a long time. You could try using a higher diffs value in pre.cluster and you could also try using taxlevel=5 or taxlevel=6 in cluter.split.

Again, I’d encourage you to read the blog post that I included previously.


This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.