Pre.cluster in unaligned, or cluster split?

Hi Pat,

I am confused about aligned/unaligned in Pre.cluster, where I see you answering that it needs aligned, since in the pre.cluster:

"### align:link:

When using unaligned sequences, the pre.cluster command allows you to select between two alignment methods - gotoh and needleman - needleman is the default setting:

* *mothur > pre.cluster(fasta=sogin.unique.filter.unique.fasta, name=sogin.unique.filter.names, diffs=2, align=needleman)* *

The needleman algorithm penalizes the same amount for opening and extending a gap. Alternatively, you could use the gotoh algorithm, which charges a different penalty for opening (default=-2) and extending (default=-1) gaps:

* *mothur > pre.cluster(fasta=sogin.unique.filter.unique.fasta, name=sogin.unique.filter.names, diffs=2, align=gotoh)* *

Our experience has shown that the added parameters in the gotoh algorithm do not improve the pairwise alignment and increases the time required for the alignment"

I am running a large and diverse dataset of COI (for which there is no aligned reference available - and this is plankton, so about 15 animal phyla), and I was wondering how to be able to get to a clustering point. Is cluster.split the way, with unaligned, and do not run pre.cluster?



Hi Leo,

Even if you skip pre.cluster you will need to have some type of alignment to calculate pairwise distances in cluster.split. You migth check out pairwise.seqs which is optimized a bit for speeding up the pairwise distance calculations


Hi Pat

But, in cluster.split it indicates:

Mothur can cluster both aligned and unaligned sequences using the cluster.split command. The splitting process will use the dist.seqs command to calculate the distance files for aligned reads, and pairwise.seqs to calculate the distance matrices from unaligned reads

It autodetects they are not aligned?

So, I assume that this will be much faster than doing pairwise.seqs with all the sequences against all the sequences. Am I right?


It may or may not be faster. pairwise.seqs allows you to use kmers to reduce the number of comparisons that are being made.


This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.