cluster.split killed


I’ve been keeping myself busy with trying to analyse an Ilumina dataset of 120 samples (of bumblebee gut microbiota), by following the MiSeq Mothur SOP. The total number of sequences after removing chimeras/mitochondria/undesirables is 339144 (from the stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.summary file). The phylotype commands work, but the OTU commands do not. Using 19 processors, the cluster.split command starts running and does so for a long time, but eventually it always gets killed (unless I try it with a very small subset, 4/120 samples, in that case it gets completed).
The first part of the cluster.split commands gets executed I think, because the logfile says this: It took 43250 seconds to split the distance file. But then it starts with the ‘reading matrix’ part. This goes on for a while, untill it doesn’t…

Anyone any ideas about how to solve this problem?


Usually when cluster.split fails it is due to a RAM issue. The more processors you use, the more memory is required. Try running the command with less processors. Also, you might try the split by taxonomy instead of distance: cluster.split(fasta=final.fasta, taxonomy=final.taxonomy, name=final.names, taxlevel=3).

1 Like