Mothur workflow basics and reproducibility

kthman · March 7, 2021, 4:59am

Hi All,
With respect to the mothur pipelines/workflow around the Cluster.split command/process. in a Pacbio SOP.

The cluster.split (fasta= ...precluster.pick.pick.fasta,
count= .. denovo.vsearch.pick.pick.count_table,
taxonomy=..*.precluster.pick.nr_v138.wang.pick.taxonomy,
splitmethod=classify, taxlevel=4, cutoff=0.03)

I expect to get “3” output files (dist, list, sensspec) created in that order.
All good so far?
..* good.unique.good.filter.unique.precluster.pick.pick.dist
..* good.unique.good.filter.unique.precluster.pick.pick.opti_mcc.list
..* good.unique.good.filter.unique.precluster.pick.pick.opti_mcc.sensspec

So my question is do I need wait for the sens.spec() part of the cluster.split()
to complete. Once I have the *.list file can just use to move on to the
make.shared(), classify.otu(), etc, etc.

The sens.spec() takes two hours or so to run to completion.
The make.shared(), classify.otu, count.groups steps all together take less than
5 minutes.

Is that KOSHER?

Also when I run the same pipeline/workflow same data on different processor counts I see some variation in sensspec output is that normal? How much variation should I see between runs? I found this when I was benchmarking mothur for a new server. It has been a while since I used mothur (had last used the MPI version).

What is actually happening (analysis wise) in the sens.spec()?

westcott · March 15, 2021, 2:02pm

The sensspec analysis can be skipped by setting the runsensspec parameter to false. You can always run the analysis later using the sens.spec command.

mothur > cluster.split (fasta= . . .precluster.pick.pick.fasta,
count= . . denovo.vsearch.pick.pick.count_table,
taxonomy= . .*.precluster.pick.nr_v138.wang.pick.taxonomy,
splitmethod=classify, taxlevel=4, cutoff=0.03, runsenspec=false)

mothur > sens.spec(list=yourList, column=yourDistanceMatrix, count=yourCountFile)

The sens.spec command calculates the tn, tp, fn and fp values. It uses these values to evaluate the clusters. The cutoff is used to determine if a given sequence is “close” or “far” from another given sequence. A true negative (tn) means if the reads are “far” apart, they should be placed in different OTUs. A true positive (tp) means if the reads are “close” they should be placed in the same OTU. A false negative (fn) means the reads are “close” but placed in separate OTUs. A false positive (fp) means the reads are “far” but placed in the same OTU.

The OptiClust method uses the tn, tp, fn, fp values to place reads into OTUs based on the statistic you want to cluster by. The default is mcc. The sens.spec results are outputted at each iteration as mothur searches for the best fit. With the cluster.split command, each split list outputs its own sensspec data. After the clusters are complete, mothur merges the individual lists, and runs a final sens.spec analysis on the complete list. The runsensspec=false parameter allows you to skip the final calculation on the complete list.

system · March 25, 2021, 2:03pm

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
cluster.split() killed in sens.spec() Commands in mothur	4	1356	June 27, 2017
sens.spec call as part of new cluster.split in 1.39 causing crash mothur bugs	6	1843	February 22, 2017
cluster.split problem Theory behind mothur	1	3393	January 9, 2015
cluster.split method fasta or classify Commands in mothur	9	8253	October 30, 2012
Cluster split only gives unique and 0.01 distance mothur bugs	3	1129	June 12, 2017

Mothur workflow basics and reproducibility

Related topics