sens.spec call as part of new cluster.split in 1.39 causing crash

amumford · January 26, 2017, 5:37pm

Hi Folks,
Have been re-running a v4 dataset (77 samples, ~150K reads/sample) through 1.39 to benchmark against 1.38.1.
Overall, I’m really, really impressed with the speedup (thanks Pat and Sarah!!!), but it seems like cluster.split is calling sens.spec after it finishes clustering and merging the clustered files—and, this seems to lead to me running out of RAM (128GB) and crashing.
These data ran fine on this server in 1.38.1—so I don’t think this is an issue with oversized distance matrices. . .
I’m not seeing much in the documentation about what sens.spec does—looks like it’s scoring the quality of the OTU calls?
Thanks for looking into this, and for all you do for the microbial ecology community!

-Adam Mumford

last bit of the log before it goes down:

It took 4593 seconds to cluster
Merging the clustered files…
It took 14 seconds to merge.
/******************************************/
Running command: sens.spec(list=PASFUOG_Spring_2016.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.opti_mcc.unique_list.list, column=PASFUOG_Spring_2016.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.dist, count=PASFUOG_Spring_2016.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.pick.pick.count_table)

NOTE: sens.spec assumes that only unique sequences were used to generate the distance matrix.

pschloss · January 30, 2017, 1:56pm

Glad you like it! Sarah will follow up with you, but you might try it without doing cluster.split. In our testing (see the preprint), it wasn’t necessarily faster and the output was a little worse than using the normal cluster command.

Pat

amumford · January 30, 2017, 4:53pm

Holy $^&! that’s fast.
Looks like it works just fine running from cluster instead of cluster.split. I was about to complain about it using only one processor—but then it finished before I could compose any sort of question.
At the risk of derailing the conversation and getting into the weeds—it looked like it lit off vsearch during the call to cluster, without it being specifically asked—is OptiClust using one of the vsearch algorithms in a some way that’s not clear to me from the preprint?
Thanks!
-Adam

from the log: mothur > cluster(fasta=PASFUOG_Spring_2016.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta, count=PASFUOG_Spring_2016.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.pick.pick.count_table, cutoff=0.15, processors=24)

Using 24 processors.
[NOTE]: Default clustering method has changed to opti. To use average neighbor, set method=average.
[WARNING]: You can only use the processors option when using the agc or dgc clustering methods. Using 1 processor.
./home/amumford/mothur-1.37.5/mothurvsearch file does not exist. Checking path…
Found vsearch in your path, using /usr/local/bin/vsearch
It took 509 seconds to cluster

pschloss · February 1, 2017, 9:01pm

Hmmm, no it’s not using vsearch. We’ll check on that error message.

amumford · February 2, 2017, 10:11pm

Quick update—
I seem to have gotten it to run by going from 24 to 20 processors—maybe that took the load off the RAM and let it finish?
Also—cluster appears to launch vsearch when it’s not explicitly given a distance matrix to start, otherwise it’ll run the new opti.
I’m wondering—is there any speed/memory benefit to running cluster.split to get the distance matrix rather than running dist.seqs?
Thanks for all the work!
-Adam

westcott · February 6, 2017, 4:53pm

We released version 1.39.1, https://github.com/mothur/mothur/releases/tag/v1.39.1. It includes a parameter runsensspec that allows you to indicate you whether you want to run the sens.spec command on the completed list file. You can set runsensspec=F to skip this step. For the vsearch question, could you post the cluster.split command you ran and the output?

amumford · February 22, 2017, 9:31pm

Sorry for the delay on getting back to you on this. . .here’s where it called ‘vsearch’ when asked for ‘opti’. I’m realizing now that I needed to run dist.seqs first if I wanted opti to have Something to cluster…
Thank for giving us the option to turn off sensspec, that seems to help.
Cheers,
-Adam

mothur > cluster(fasta=data.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta, count=data.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.pick.pick.count_table, method = opti, cutoff=0.15, processors=24)

Using 24 processors.
[WARNING]: You can only use the processors option when using the agc or dgc clustering methods. Using 1 processor.
./home/amumford/mothur-1.37.5/mothurvsearch file does not exist. Checking path…
Found vsearch in your path, using /usr/local/bin/vsearch
It took 507 seconds to cluster

Output File Names:
data.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.opti.unique_list.list

Topic		Replies	Views
cluster.split() killed in sens.spec() Commands in mothur	4	1353	June 27, 2017
sens.spec kills itself mothur bugs	2	2729	November 29, 2012
Mothur workflow basics and reproducibility Commands in mothur	2	615	March 25, 2021
cluster.split running out of RAM mothur bugs	1	3090	February 25, 2014
Distance matrix issues - still running Commands in mothur	6	722	October 14, 2019

sens.spec call as part of new cluster.split in 1.39 causing crash

Related topics