Hi Pat,
I am confused about aligned/unaligned in Pre.cluster, where I see you answering that it needs aligned, since in the pre.cluster:
When using unaligned sequences, the pre.cluster command allows you to select between two alignment methods - gotoh and needleman - needleman is the default setting:
* *mothur > pre.cluster(fasta=sogin.unique.filter.unique.fasta, name=sogin.unique.filter.names, diffs=2, align=needleman)* *
The needleman algorithm penalizes the same amount for opening and extending a gap. Alternatively, you could use the gotoh algorithm, which charges a different penalty for opening (default=-2) and extending (default=-1) gaps:
* *mothur > pre.cluster(fasta=sogin.unique.filter.unique.fasta, name=sogin.unique.filter.names, diffs=2, align=gotoh)* *
Our experience has shown that the added parameters in the gotoh algorithm do not improve the pairwise alignment and increases the time required for the alignment"
I am running a large and diverse dataset of COI (for which there is no aligned reference available - and this is plankton, so about 15 animal phyla), and I was wondering how to be able to get to a clustering point. Is cluster.split the way, with unaligned, and do not run pre.cluster?
Thanks,
Leo