Hello, I’ve been using cluster.split with taxonomy to try and cluster my dataset. (166k unique sequences), it contains about 223 Order-level taxa. I’m using a PC, 8 processors.
cluster.split(column=file.dist, count=file.count_table, large=T, cutoff=0.03, runsensspec=f, taxonomy=file.taxonomy, splitmethod=classify)
It appears to finish after about 13 hours and writes the output file, but there are still 7 small count_table temp files leftover in the directory. When I try to make.shared I get an error message saying:
“Your group file contains 166381 sequences and list file contains 166360 sequences. Please correct.”
There are 21 readnames in the temp files so I assume there was an unflagged problem during merging.
Is there any way to recover those reads without rerunning the whole clustering step?
Is this something I can prevent happening again in future?