Mothur workflow basics and reproducibility

Hi All,
With respect to the mothur pipelines/workflow around the Cluster.split command/process. in a Pacbio SOP.

The cluster.split (fasta= ...precluster.pick.pick.fasta,
count= ..
denovo.vsearch.pick.pick.count_table,
taxonomy=..*.precluster.pick.nr_v138.wang.pick.taxonomy,
splitmethod=classify, taxlevel=4, cutoff=0.03)

I expect to get “3” output files (dist, list, sensspec) created in that order.
All good so far?
..* good.unique.good.filter.unique.precluster.pick.pick.dist
..* good.unique.good.filter.unique.precluster.pick.pick.opti_mcc.list
..* good.unique.good.filter.unique.precluster.pick.pick.opti_mcc.sensspec

So my question is do I need wait for the sens.spec() part of the cluster.split()
to complete. Once I have the *.list file can just use to move on to the
make.shared(), classify.otu(), etc, etc.

The sens.spec() takes two hours or so to run to completion.
The make.shared(), classify.otu, count.groups steps all together take less than
5 minutes.

Is that KOSHER?

Also when I run the same pipeline/workflow same data on different processor counts I see some variation in sensspec output is that normal? How much variation should I see between runs? I found this when I was benchmarking mothur for a new server. It has been a while since I used mothur (had last used the MPI version).

What is actually happening (analysis wise) in the sens.spec()?

The sensspec analysis can be skipped by setting the runsensspec parameter to false. You can always run the analysis later using the sens.spec command.

mothur > cluster.split (fasta= . . .precluster.pick.pick.fasta,
count= . .
denovo.vsearch.pick.pick.count_table,
taxonomy= . .*.precluster.pick.nr_v138.wang.pick.taxonomy,
splitmethod=classify, taxlevel=4, cutoff=0.03, runsenspec=false)

mothur > sens.spec(list=yourList, column=yourDistanceMatrix, count=yourCountFile)

The sens.spec command calculates the tn, tp, fn and fp values. It uses these values to evaluate the clusters. The cutoff is used to determine if a given sequence is “close” or “far” from another given sequence. A true negative (tn) means if the reads are “far” apart, they should be placed in different OTUs. A true positive (tp) means if the reads are “close” they should be placed in the same OTU. A false negative (fn) means the reads are “close” but placed in separate OTUs. A false positive (fp) means the reads are “far” but placed in the same OTU.

The OptiClust method uses the tn, tp, fn, fp values to place reads into OTUs based on the statistic you want to cluster by. The default is mcc. The sens.spec results are outputted at each iteration as mothur searches for the best fit. With the cluster.split command, each split list outputs its own sensspec data. After the clusters are complete, mothur merges the individual lists, and runs a final sens.spec analysis on the complete list. The runsensspec=false parameter allows you to skip the final calculation on the complete list.

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.