Problems with dist.seqs and illumina reads

Hi there,

this is my first time using mothur - so i am still getting familiar with it.
First, i am using it as an OTU-picking alternative to my inhouse-pipeline. I have Illumina paired end reads of a single hypervariable region of 16S rDNA. A typical sample has about 1 million reads (after procession/quality control/chimera check).
I have a single fasta as input for a sequence of commands (basically unique, align, screen, filter, unique, dist, cluster, with summaries after most steps).
The input file that is put into dist.seqs is 230MB - however, dist.seqs takes forever (about 10 hours on 60 cores!) and it results in a gigantic 79GB *.dist file that the next command (cluster) is refusing to read (at least i am stuck here).
So, what is wrong? Is mothur not aimed at dealing with big datasets? then i apologize, i thought it would be a nice try, since everybody seems to use it on NGS data.
The command for dist.seqs was(fasta=, cutoff=0.10, processors=60).

Any help is appreciated!

/edit: Just realized we are using v1.25 as part of the QIIME 1.7.0 installation. Ill ask my PI to install the newest version. Could that solve already the problems?

So why run mothur through QIIME? :slight_smile:

Yes, 1.25 is more than a year old and there’s a lot of new features in mothur for processing paired end reads. I’d strongly encourage you to check out the mothur MiSeq SOP as well as the associated AEM paper that we published over the summer. Once you’ve done this, if you have specific questions, feel free to holler.

Pat