HUGE dist file when running Eukarya analysis

mafernandez · July 24, 2019, 8:17am

Hello there.

I am running a MiSeq analysis on 11 samples of 18S rDNA (Eukarya) following MiSeqSOP (Mothur v.1.42) on a 16 processors server (64GB RAM). As it has happened to me before, I know that with my pair of primers we amplify lots of diversity (from algae to insects), so the dist. file should always be very big, even when running it with cutoff=0.03.

However, in this experiment it is extremely big for such a small number of samples. It took 61 hours to complete the command (dist.seqs(fasta=current, cutoff=0.03, output=lt)) and, consequently, the cluster command is taking very long too.

Is there anything I could change in my commands to reduce the size of the dist file?
I use output=lt since I want to get representative sequences afterwards and in my experience this is the best output to do so.

Thank you very much

pschloss · July 30, 2019, 3:01pm

I would remove the output=lt option. In addition, you might also run cluster.split instead of dist.seqs/cluster. You can get the representative sequence from each OTU with get.oturep using the method=abundance option. In our experience, this effectively gives the same output as using the distances and uses far less RAM.

Pat

mafernandez · July 31, 2019, 10:20am

Thanks, Pat.

I have tried to run cluster.split as I already had the phylip.dist file. If this does not work, I will try to change the output form the dist.seqs, as you suggested.

Keep working…

system · August 10, 2019, 10:20am

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Problems when using cluster.split on huge .dist file Commands in mothur	2	1423	August 30, 2019
Enormous dist file Commands in mothur	5	2123	October 15, 2015
Problems with dist.seqs and illumina reads mothur bugs	1	2521	January 6, 2014
Dist.seq output too big Commands in mothur	3	2945	January 20, 2014
dist.seq- taking lot of disk space Commands in mothur	1	1208	January 28, 2016

HUGE dist file when running Eukarya analysis

Related topics