Hi,
I’m a Mothur beginner wondering if it is possible that running the “cluster” command on a simple (not particularly powerfull) computer takes only 49 sec (for 8462 unique seqs)? it is supposed to be a loooong step…
Here is what I did:
mothur > dist.seqs(fasta=majorque.pick.fasta, cutoff=0.15, processors=2)
Output File Name:
majorque.pick.dist
It took 224 to calculate the distances for 8462 sequences.
mothur > cluster(column=majorque.pick.dist, name=majorque.pick.names)
********************###########
Reading matrix: |||||||||||||||||||||||||||||||||||||||||||||||||||
changed cutoff to 0.0540724
Output File Names:
majorque.pick.an.sabund
majorque.pick.an.rabund
majorque.pick.an.list
It took 49 seconds to cluster
Here is what I did from the begining - following the SOP :
Trim.flows
Shhh.flows
Trim.seqs
Unique.seqs
Align.seqs
Screen.seq
Filter.seq
Unique.seq
Pre cluster
Uchime (tested with or without an additional align+filter after chimera removal)
Classify seq
Remove lineage (Mitochondria-Chloroplast-Archaea-Eukarya-unknown)
Dist.seqs
Cluster
Everywhere I did follow the SOP and use the parameters as recommended, except for the following:
- I used 360-720 as parameters for trim.flows (if using 450 flows, more than 50% of my sequences ended in the scrap file)
- I did not use “trump=.” when I first filtered the sequences because it shortened the mean length of the sequences from 420 to 257 bp *** by the way I did not find an explanation on the wiki/forum, so any explanation here is also more than welcome, note that the sequences seem to align well, and overlap seems ok***. Anyway, after chimera removal, I tested an aditional align+filter step, this time using “trump=.” and results look the same with cluster command taking 70 sec.
Is there something I’m doing wrong?
Hoping I’m not missing something obvious here,
Thanks in advance for your help,
Marina