Hello, Dr. Schloss and others,
I am trying to define the OTUs from my RTG.fasta data (89Mb, about 80K sequences). Following is the commands I used:
align.seqs(candidate=RTG.fasta, template=reference.fasta, flip=T, processors=8)
filter.seqs(fasta=RTG.align, vertical=T, processors=8)
dist.seqs(fasta=RTG.filter.fasta, cutoff=0.03, processors=8, output=lt)
cluster(phylip=RTG.filter.phylip.dist, method=furthest, cutoff=0.03)
However the problem is after the dist.seqs, a 82Gb distance matrix was made and it was so huge! So the cluster commands always failed in reading the matrix, even tried in a supercomputer node.
Is there anyway to reduce the size of distance matrix or other way to make the cluster? Thank you!