HUGE dist file when running Eukarya analysis

Hello there.

I am running a MiSeq analysis on 11 samples of 18S rDNA (Eukarya) following MiSeqSOP (Mothur v.1.42) on a 16 processors server (64GB RAM). As it has happened to me before, I know that with my pair of primers we amplify lots of diversity (from algae to insects), so the dist. file should always be very big, even when running it with cutoff=0.03.

However, in this experiment it is extremely big for such a small number of samples. It took 61 hours to complete the command (dist.seqs(fasta=current, cutoff=0.03, output=lt)) and, consequently, the cluster command is taking very long too.

Is there anything I could change in my commands to reduce the size of the dist file?
I use output=lt since I want to get representative sequences afterwards and in my experience this is the best output to do so.

Thank you very much

I would remove the output=lt option. In addition, you might also run cluster.split instead of dist.seqs/cluster. You can get the representative sequence from each OTU with get.oturep using the method=abundance option. In our experience, this effectively gives the same output as using the distances and uses far less RAM.


Thanks, Pat.

I have tried to run cluster.split as I already had the phylip.dist file. If this does not work, I will try to change the output form the dist.seqs, as you suggested.

Keep working…

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.