Hi there,
this is my first time using mothur - so i am still getting familiar with it.
First, i am using it as an OTU-picking alternative to my inhouse-pipeline. I have Illumina paired end reads of a single hypervariable region of 16S rDNA. A typical sample has about 1 million reads (after procession/quality control/chimera check).
I have a single fasta as input for a sequence of commands (basically unique, align, screen, filter, unique, dist, cluster, with summaries after most steps).
The input file that is put into dist.seqs is 230MB - however, dist.seqs takes forever (about 10 hours on 60 cores!) and it results in a gigantic 79GB *.dist file that the next command (cluster) is refusing to read (at least i am stuck here).
So, what is wrong? Is mothur not aimed at dealing with big datasets? then i apologize, i thought it would be a nice try, since everybody seems to use it on NGS data.
The command for dist.seqs was(fasta=, cutoff=0.10, processors=60).
Any help is appreciated!
/edit: Just realized we are using v1.25 as part of the QIIME 1.7.0 installation. Ill ask my PI to install the newest version. Could that solve already the problems?