Cluster.split leaving temp files behind, but no error message

SJSalter · May 29, 2021, 7:58am

Hello, I’ve been using cluster.split with taxonomy to try and cluster my dataset. (166k unique sequences), it contains about 223 Order-level taxa. I’m using a PC, 8 processors.

cluster.split(column=file.dist, count=file.count_table, large=T, cutoff=0.03, runsensspec=f, taxonomy=file.taxonomy, splitmethod=classify)

It appears to finish after about 13 hours and writes the output file, but there are still 7 small count_table temp files leftover in the directory. When I try to make.shared I get an error message saying:

“Your group file contains 166381 sequences and list file contains 166360 sequences. Please correct.”

There are 21 readnames in the temp files so I assume there was an unflagged problem during merging.
Is there any way to recover those reads without rerunning the whole clustering step?
Is this something I can prevent happening again in future?
Thank you.

westcott · June 1, 2021, 6:10pm

It looks like we have a small bug in the split by distance option. Here’s a workaround for your current dataset. The .temp count files contain reads that should be singletons in your list file. Rather than rerunning the entire cluster.split command, you can manually add the remaining 21 reads to your list file as singletons.

NOTE: Be sure to update the number of OTUs in the second column of the list file, as well as adding OTUxxxx labels to the header line for each new OTU added.

The splitting by distance method is the slowest option for cluster.split. We recommend using the fasta, taxonomy, count options for the cluster.split command as that option takes advantage of parallelized processing during the splitting process. In the future, try this instead:

mothur > cluster.split(fasta=file.fasta, count=file.count_table, cutoff=0.03, runsensspec=f, taxonomy=file.taxonomy, splitmethod=classify)

Kindly,
Sarah

system · June 11, 2021, 6:11pm

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
cluster split failing Commands in mothur	9	3668	October 14, 2015
cluster.split Commands in mothur	4	1283	May 26, 2017
Issues with cluster.split command removing groups no list file provided and is stuck mothur bugs	3	414	January 17, 2022
cluster.split output: list file+3empty temp files Commands in mothur	4	3260	August 6, 2014
cluster.split hangs before merging list files mothur bugs	4	2786	November 25, 2015

Cluster.split leaving temp files behind, but no error message

Related topics