mothur

Distance matrix issues - still running

I was unable to run the cluster.split script because of multiple crashes (archived message from July 12) and it was recommended that I instead try dist.seqs followed by the cluster command. I tried that and the process has been running for >55 days on a high performance cluster.

This is a large data set with over >400 samples. Are there any suggestions with how to get this processed or alternatives to splitting up the data set and processing to move past this point? I have run a number of data sets (PE150 and PE250) through this workflow and have never run into this issue. Singletons are still in the data set and I have wondered if that might be the hold-up.

Any advice or suggestions is appreciated - I just want to get this data processed.

how did you precluster?

I simply did:

pre.cluster(fasta=current, count=current, diffs=4) using the fasta and count files generated by unique.seqs

What version of mothur are you using? What are these data? PE250? V4? I suspect not since you are doing diffs=4. I’m not sure who advised doing dist.seqs and cluster, but that’s going to take a long time compared to cluster.split. I’m concerned that you have a data quality problem and that by going to a larger dataset, those problems are being exacerbated.

Pat

Pat,

I am using mothur_1.42.1. I had some issues getting the data through precluster with 1.41. and 1.41.2 so in an earlier thread you suggested I upgrade and that fixed the issue.

Its 16S amplicon data - PE250 - V3V4. I am not saying there’s not an issue with data quality here but we have worked with similar data in the past and never run into this situation. It is a much large data set because it’s a combination of 7 separate sequencing runs. Do you have any recommendations on trying to figure out if there’s a data quality issue with some or all of the data?

When I was running cluster.split, it returned the following error message so it was suggested I take the approach above?

Command: cluster.split(fasta=AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta,count=AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.pick.pick.pick.count_table,taxonomy=AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.nr_v128.wang.pick.pick.taxonomy, splitmethod=classify, taxlevel=4, cutoff=0.03,processors=32)

Part of log file output:
It took 709838 seconds to split the distance file.
AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.9.dist
AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.75.dist
AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.93.dist
AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.116.dist
AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.89.dist
AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.218.dist
AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.242.dist
AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.247.dist
AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.217.dist
AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.143.dist
AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.336.dist
AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.394.dist
AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.254.dist
AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.444.dist

Clustering AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.444.dist

Clustering AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.413.dist

Clustering AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.453.dist

Clustering AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.366.dist

Clustering AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.312.dist

Clustering AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.463.dist

tp tn fp fn sensitivity specificity ppv npv fdr accuracy mcc f1score
0 0 0 3 0 0 0 0 1 0 0 0

Clustering AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.455.dist

tp tn fp fn sensitivity specificity ppv npv fdr accuracy mcc f1score
0 12 0 3 0 1 0 0.8 1 0.8 0 0

Clustering AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.368.dist
[ERROR]: Could not open AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.455.opti_mcc.list

tp tn fp fn sensitivity specificity ppv npv fdr accuracy mcc f1score
18 27 0 0 1 1 1 1 1 1 1 1

Clustering AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.345.dist
[ERROR]: Could not open AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.368.opti_mcc.list

tp tn fp fn sensitivity specificity ppv npv fdr accuracy mcc f1score
0 6 0 15 0 1 0 0.285714 1 0.285714 0 0

Clustering AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.473.dist

tp tn fp fn sensitivity specificity ppv npv fdr accuracy mcc f1score
0 3 0 3 0 1 0 0.5 1 0.5 0 0

Clustering AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.292.dist
[ERROR]: Could not open AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.345.opti_mcc.list

tp tn fp fn sensitivity specificity ppv npv fdr accuracy mcc f1score
48 3517 2 3 0.941176 0.999432 0.96 0.999148 0.96 0.998599 0.949833 0.950495

Clustering AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.122.dist

tp tn fp fn sensitivity specificity ppv npv fdr accuracy mcc f1score
0 0 0 3 0 0 0 0 1 0 0 0

Clustering AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.320.dist
[ERROR]: Could not open AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.473.opti_mcc.list

tp tn fp fn sensitivity specificity ppv npv fdr accuracy mcc f1score
36 47 0 8 0.818182 1 1 0.854545 1 0.912088 0.836166 0.9

Clustering AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.376.dist

tp tn fp fn sensitivity specificity ppv npv fdr accuracy mcc f1score
4 17 0 0 1 1 1 1 1 1 1 1

Clustering AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.254.dist

tp tn fp fn sensitivity specificity ppv npv fdr accuracy mcc f1score
3 75 0 0 1 1 1 1 1 1 1 1

Clustering AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.392.dist
[ERROR]: Could not open AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.292.opti_mcc.list

tp tn fp fn sensitivity specificity ppv npv fdr accuracy mcc f1score
0 6 0 15 0 1 0 0.285714 1 0.285714 0 0

Clustering AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.422.dist
[ERROR]: Could not open AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.122.opti_mcc.list

tp tn fp fn sensitivity specificity ppv npv fdr accuracy mcc f1score
153 1598 11 8 0.950311 0.993163 0.932927 0.995019 0.932927 0.989266 0.935678 0.941538

[ERROR]: Could not open AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.320.opti_mcc.list

tp tn fp fn sensitivity specificity ppv npv fdr accuracy mcc f1score
19 538 4 0 1 0.99262 0.826087 1 0.826087 0.99287 0.905533 0.904762

Clustering AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.360.dist
[ERROR]: Could not open AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.376.opti_mcc.list

tp tn fp fn sensitivity specificity ppv npv fdr accuracy mcc f1score
113 696 9 43 0.724359 0.987234 0.92623 0.941813 0.92623 0.939605 0.785935 0.81295

Clustering AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.370.dist
[ERROR]: Could not open AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.254.opti_mcc.list

tp tn fp fn sensitivity specificity ppv npv fdr accuracy mcc f1score
20 721 0 0 1 1 1 1 1 1 1 1

Clustering AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.142.dist

Clustering AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.394.dist
[ERROR]: Could not open AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.370.opti_mcc.list

tp tn fp fn sensitivity specificity ppv npv fdr accuracy mcc f1score
396 3500 12 8 0.980198 0.996583 0.970588 0.997719 0.970588 0.994893 0.972535 0.975369

**** Exceeded maximum allowed command errors, quitting ****
[ERROR]: Could not open AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.392.opti_mcc.list

tp tn fp fn sensitivity specificity ppv npv fdr accuracy mcc f1score
0 40 0 15 0 1 0 0.727273 1 0.727273 0 0

**** Exceeded maximum allowed command errors, quitting ****
[ERROR]: Could not open AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.422.opti_mcc.list

tp tn fp fn sensitivity specificity ppv npv fdr accuracy mcc f1score
33 186 2 10 0.767442 0.989362 0.942857 0.94898 0.942857 0.948052 0.82155 0.846154

**** Exceeded maximum allowed command errors, quitting ****
[ERROR]: Could not open AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.360.opti_mcc.list

tp tn fp fn sensitivity specificity ppv npv fdr accuracy mcc f1score
51 202 0 0 1 1 1 1 1 1 1 1

**** Exceeded maximum allowed command errors, quitting ****
[ERROR]: Could not open AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.394.opti_mcc.list

tp tn fp fn sensitivity specificity ppv npv fdr accuracy mcc f1score
52 297 1 1 0.981132 0.996644 0.981132 0.996644 0.981132 0.994302 0.977776 0.981132

Clustering AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.390.dist

Clustering AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.339.dist

Clustering AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.420.dist

Clustering AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.456.dist

Clustering AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.449.dist

Clustering AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.375.dist

Clustering AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.408.dist

Clustering AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.468.dist

Clustering AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.451.dist

Clustering AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.385.dist

Clustering AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.459.dist

Clustering AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.424.dist

Clustering AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.348.dist

Clustering AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.474.dist

Clustering AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.369.dist

Clustering AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.465.dist

Clustering AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.425.dist

Clustering AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.421.dist

Clustering AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.426.dist

Clustering AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.464.dist

Clustering AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.450.dist

Clustering AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.158.dist

Clustering AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.416.dist

Clustering AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.447.dist

Clustering AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.442.dist

Clustering AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.172.dist

Detected 14 [ERROR] messages, please review.

srun: error: UV00000395-P001: task 0: Exited with exit code 1
Mothur Run Failed with exit code 1 !!!

I would appreciate any recommendations.

I would recommend running the cluster.split command in two parts, https://mothur.org/wiki/Cluster.split#file. Mothur will split the data into taxonomic groups and calculate the distance matrices for each grouping. Then you can run the clustering step. Here’s how to do that:

mothur > cluster.split(fasta=AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta,count=AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.pick.pick.pick.count_table,taxonomy=AmphStability.trim.contigs.good.unique.good.filter.unique.precluster.pick.nr_v128.wang.pick.pick.taxonomy, splitmethod=classify, taxlevel=4, cutoff=0.03,processors=32, cluster=f)

mothur > cluster.split(file=current, processors=8) - the processors option here is a guess.

Things to consider: The more processors used the more RAM is required. Each processor loads a distance matrix into memory for processing.

Also, the error’s about opening the list files could be caused by a process failing due to lack of memory or a process failing because you are out of allocated disc space to write the files.

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.