Cluster.split does not complete clustering

Hira · May 2, 2022, 1:11pm

Hi
While doing my analysis I ran the cluster.split command but it did not complete the clustering process. Given below are details
In the picture below the distances were calculated for each group

The calculation of distances were completed like this as below and clustering process was started as follows

This clustering process did not complete. It just stopped here as given below and nothing happens after this

pschloss · May 3, 2022, 5:41pm

It’s likely still running. I suspect you likely have a big distance matrix that it is trying to process.

Pat

Hira · May 3, 2022, 7:16pm

Yes It looks like its still running and i left it like that for a whole day and it showed no progress. I repeated this but It happened again like that. So what i did now is as follows
I first generated a summary of the files that are going to be used later

Then I ran the cluster.split command as follows (this time i used “cluster=f” parameter.)

This time it calculated the distances for each group and i got the following files (few outputfiles are shown in screenshot not all)

After this I ran the following command (The clustering process started)

It has stopped here at “pick.86.dist.temp”

I had already tried the cluster.split command without using the cluster=f parameter and in that process when clustering process started, the same thing happened before ( as posted). The software did not stopped working or anything, its just that at this certain step nothing happens later. I want to generate an OTU table. Please suggest me what should i do next.
Thanks
Hira

pschloss · May 5, 2022, 5:42pm

You need to wait longer. If it sits like this it is still processing. My recollection is that you have a ton of sequences. It’s going to take time.

Hira · May 6, 2022, 1:41pm

Sir, I will wait for it but Isn’t there any alternative option for this? At this step I have the following two files:

The “final.dist” is 101.84GB. The file splitting step has been completed using mothur > cluster.split(fasta=final.fasta, count=final.count_table, taxonomy=fial.taxonomy, taxlevel=4, cluster=f, processors=8). I am trying to cluster the sequences into OTUs by using the following command
mothur > cluster.split(file=final.file, processors=4)
[WARNING]: When using the file option, it is recommended you include the name or count file. Doing so will ensure the OTUs are printed by OTU size reflecting the redundant reads, instead of just the unique reads.

The clustering step is not completing i.e. a final mcc.list file is not there. Is it because of this warning?

pschloss · May 10, 2022, 4:20pm

No the warning shouldn’t be relevant to this. I suspect you have noisy data that is causing things to take a long time. You could try using a higher diffs value in pre.cluster and you could also try using taxlevel=5 or taxlevel=6 in cluter.split.

Again, I’d encourage you to read the blog post that I included previously.

Pat

system · May 20, 2022, 4:20pm

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
cluster.split cluster=f, then what Commands in mothur	1	2459	February 5, 2013
Facing clustering issue Commands in mothur	10	331	September 19, 2022
problem with cluster.split...? mothur bugs	2	3073	December 29, 2014
I split the distance file first. How to cluster now? Feature requests	3	3434	August 26, 2013
Issues with cluster command Commands in mothur	5	4452	December 19, 2012

Cluster.split does not complete clustering

Related topics