pre.cluster is very slow

wbrazelton · August 8, 2017, 6:49pm

I have been using pre.cluster for various projects for years, and suddenly it has become ~600x slower. Previously we could process 20,000 sequences with pre.cluster in a few minutes. Now it takes >30 hours. The shift from fast to slow did not correspond to any change that we made to our system, as far as we can tell, and we get the same slow speed with mothur v.1.36.1. as well as v.1.39.5. We also have the same experience when running pre.cluster in an interactive session or as a batch script. This really sounds like a weird computational issue on our end (we have a mini-cluster running CentOS), but we haven’t seen this slow down with any other programs, nor with any other mothur commands. I’m completely stumped, so I’m posting here in case anyone has any insight. Thanks.

pschloss · August 10, 2017, 12:13pm

Hey,

Hmmnm, it should be faster since we’ve made some speed ups in the last year or so. Are you running pre.cluster in group mode? Are you sure you have the right version number of the software?

Pat

Alex_Thibodeau · August 10, 2017, 6:22pm

I do not know if this is related, but on a cluster of compute Canada (SLURM workload manager), I cannot complete pre.cluster at all, it crashes at the very end as for some reason sequences are not being found in the count table, even when using processors=1.

Have you tried it on a regular machine?

Everything runs very smoothly on my workplace but on the cluster, it is an entirely different story.

westcott · August 11, 2017, 2:43pm

In version 1.36 we made a change to allow you to cluster unaligned sequences. Running pre.cluster with unaligned sequences takes longer because mothur must do a pairwise alignment before attempting to cluster. Could that be the source of the slowdown?

wbrazelton · September 13, 2017, 8:36pm

Indeed, processing unaligned sequences was the source of the problem. My (apparently incorrect) assumption was that pre.cluster should work similarly with unaligned or aligned sequences. Would it be reasonable to print a warning to the user when they are providing unaligned sequences? It seems hundreds of times faster to run align.seqs and then pre.cluster than to run pre.cluster with unaligned sequences, so I’m not sure why anyone would ever intentionally want to use pre.cluster with unaligned sequences. Or maybe I’m the only one stupid enough to waste several weeks like this, and such a warning would be unnecessary.

Topic		Replies	Views
Pre.cluster taking longer than usual and eliminating 90% of sequences Commands in mothur	4	325	September 4, 2022
pre.cluster taking a long time mothur bugs	8	2619	May 10, 2017
Pre.cluster is taking forever mothur bugs	2	366	March 31, 2022
Pre.cluster hangs without error or warning Commands in mothur	4	650	November 4, 2019
Help in pre.cluster Commands in mothur	3	329	August 18, 2023

pre.cluster is very slow

Related topics