dist.seq- taking lot of disk space

Hi All,

I am trying to create a distance matrix with dist.seqs() command. I have used cut-off as 0.20.
I have 2508287 sequences in All_Samples.good.unique1.good.filter.unique.precluster.pick.pick.fasta file.
The command I am using is:
mothur “#dist.seqs(fasta=All_Samples.good.unique1.good.filter.unique.precluster.pick.pick.fasta, cutoff=0.20,processors=48)”

It is making 48 temp files and these files are taking a lot of disk space. In 2 days each of these files is more than 350 gb and I am running out of space.
I would like to know if I am doing anything wrong and how could I resolve this.


What region are you using? If you aren’t sequencing the V4 region with V2 chemistry then you need to read this:


Also, you should be sure to use cluster.split, which will result in smaller distance matrices…