Distance Matrix too big!

Hello, Dr. Schloss and others,

I am trying to define the OTUs from my RTG.fasta data (89Mb, about 80K sequences). Following is the commands I used:

align.seqs(candidate=RTG.fasta, template=reference.fasta, flip=T, processors=8)
filter.seqs(fasta=RTG.align, vertical=T, processors=8)
dist.seqs(fasta=RTG.filter.fasta, cutoff=0.03, processors=8, output=lt)
cluster(phylip=RTG.filter.phylip.dist, method=furthest, cutoff=0.03)

However the problem is after the dist.seqs, a 82Gb distance matrix was made and it was so huge! So the cluster commands always failed in reading the matrix, even tried in a supercomputer node.

Is there anyway to reduce the size of distance matrix or other way to make the cluster? Thank you!

Yeah, if you follow our denoising pipeline in the Schloss SOP page you’ll reduce the complexity considerably. It doesn’t look like you’re denoising your sequences, trimming them to the same alignment space, using unique.seqs, checking for chimeras, or performing pre.cluster. Not only do these steps denies the error rates, but they also make dist.seqs and cluster run much faster.

Thank you, Dr. Schloss,

I have done the trim.seqs and here the RTG.fasta file was named after that. I will try to do the unique.seqs and check chimeras and pre.cluster to see what happened. Thanks a lot again.