Segmentation Fault in Cluster

I’m trying to improve on my pipeline for analyzing full-length 16S sequences from PacBio CCS, but after some recent changes I’ve started to get segmentation faults at the Clustering step and I’m not sure why…

“”"
mothur > align.seqs(fasta=Chun_0010.filter.fasta, flip=t, reference=/home/UNIXHOME/bbowman/data/references/16S/silva.both.align, processors=16)
mothur > summary.seqs(fasta=Chun_0010.filter.align)
mothur > screen.seqs(start=5251, fasta=Chun_0010.filter.align, end=38908, processors=16)
mothur > filter.seqs(fasta=Chun_0010.filter.good.align, vertical=T, processors=16)
mothur > unique.seqs(fasta=Chun_0010.filter.good.filter.fasta)
mothur > pre.cluster(diffs=4, fasta=Chun_0010.filter.good.filter.unique.fasta, name=Chun_0010.filter.good.filter.names)
mothur > chimera.uchime(fasta=Chun_0010.filter.good.filter.unique.precluster.fasta, processors=16, reference=/home/UNIXHOME/bbowman/data/references/16S/silva.gold.align)
mothur > remove.seqs(fasta=Chun_0010.filter.good.filter.unique.precluster.fasta, accnos=Chun_0010.filter.good.filter.unique.precluster.uchime.accnos)
mothur > dist.seqs(output=lt, fasta=Chun_0010.filter.good.filter.unique.precluster.pick.fasta, calc=onegap, processors=16, countends=F)
mothur > cluster(phylip=Chun_0010.filter.good.filter.unique.precluster.pick.phylip.dist, name=Chun_0010.filter.good.filter.unique.precluster.names, method=average)
“”"

The process reaches the clustering step consistently, but then seg-faults out during the reading in of the matrix. I’ve repeated this process on both the default version of Mothur I had installed (v1.30) and with the most recent stable build (v1.33.2) and observed the same behavior.

I can send the raw-data file and my logfiles if needed (fasta is ~3MB zipped)

-Brett

How big is the distance matrix? This generally happens when people run out of RAM, which can happen with a lot of high error rate data (cough pacbio cough). You might try the cluster.split approach.

My biggest distance matrix is ~500MB, but I’m seeing this intermittently in matrices <100MB in size as well, so I suspect the cause is something other than the size.

I’m seeing the same thing with a 15 MB dist file. Nearest neighbor fails 100% of the time, and average neighbor fails some of the time. Mothur 1.32 clusters the same dist file without any problem so must be a bug.

If you guys could compress the distance matrix and name/count file and post it on google drive or email it to us we can take a look. Getting the exact command you are running would be helpful too.

Email sent.