pre.cluster is very slow

I have been using pre.cluster for various projects for years, and suddenly it has become ~600x slower. Previously we could process 20,000 sequences with pre.cluster in a few minutes. Now it takes >30 hours. The shift from fast to slow did not correspond to any change that we made to our system, as far as we can tell, and we get the same slow speed with mothur v.1.36.1. as well as v.1.39.5. We also have the same experience when running pre.cluster in an interactive session or as a batch script. This really sounds like a weird computational issue on our end (we have a mini-cluster running CentOS), but we haven’t seen this slow down with any other programs, nor with any other mothur commands. I’m completely stumped, so I’m posting here in case anyone has any insight. Thanks.

Hey,

Hmmnm, it should be faster since we’ve made some speed ups in the last year or so. Are you running pre.cluster in group mode? Are you sure you have the right version number of the software?

Pat

I do not know if this is related, but on a cluster of compute Canada (SLURM workload manager), I cannot complete pre.cluster at all, it crashes at the very end as for some reason sequences are not being found in the count table, even when using processors=1.

Have you tried it on a regular machine?

Everything runs very smoothly on my workplace but on the cluster, it is an entirely different story.

In version 1.36 we made a change to allow you to cluster unaligned sequences. Running pre.cluster with unaligned sequences takes longer because mothur must do a pairwise alignment before attempting to cluster. Could that be the source of the slowdown?

Indeed, processing unaligned sequences was the source of the problem. My (apparently incorrect) assumption was that pre.cluster should work similarly with unaligned or aligned sequences. Would it be reasonable to print a warning to the user when they are providing unaligned sequences? It seems hundreds of times faster to run align.seqs and then pre.cluster than to run pre.cluster with unaligned sequences, so I’m not sure why anyone would ever intentionally want to use pre.cluster with unaligned sequences. Or maybe I’m the only one stupid enough to waste several weeks like this, and such a warning would be unnecessary.

1 Like