dist.seq- taking lot of disk space

newbie · January 25, 2016, 4:09pm

Hi All,

I am trying to create a distance matrix with dist.seqs() command. I have used cut-off as 0.20.
I have 2508287 sequences in All_Samples.good.unique1.good.filter.unique.precluster.pick.pick.fasta file.
The command I am using is:
mothur “#dist.seqs(fasta=All_Samples.good.unique1.good.filter.unique.precluster.pick.pick.fasta, cutoff=0.20,processors=48)”

It is making 48 temp files and these files are taking a lot of disk space. In 2 days each of these files is more than 350 gb and I am running out of space.
I would like to know if I am doing anything wrong and how could I resolve this.

Thanks!

pschloss · January 28, 2016, 1:25pm

What region are you using? If you aren’t sequencing the V4 region with V2 chemistry then you need to read this:

http://blog.mothur.org/2014/09/11/Why-such-a-large-distance-matrix%3F/

Also, you should be sure to use cluster.split, which will result in smaller distance matrices…

Pat

Topic		Replies	Views
Dist.seqs running for many days/large file Commands in mothur	8	1535	April 26, 2020
dist.seqs() -- How to deal with 240K input sequences? Commands in mothur	1	872	October 30, 2017
Segmentation fault when clustering a 1.44 GB dist file mothur bugs	5	135479	November 14, 2009
Dist.seqs of 700 000 illumina sequences Commands in mothur	4	4459	March 31, 2013
Got Big files after running Dist.seqs Commands in mothur	1	1750	December 6, 2013

dist.seq- taking lot of disk space

Related topics