Pre.cluster in unaligned, or cluster split?

leocadio · November 23, 2023, 3:57pm

Hi Pat,

I am confused about aligned/unaligned in Pre.cluster, where I see you answering that it needs aligned, since in the pre.cluster:

"### align

When using unaligned sequences, the pre.cluster command allows you to select between two alignment methods - gotoh and needleman - needleman is the default setting:

* *mothur > pre.cluster(fasta=sogin.unique.filter.unique.fasta, name=sogin.unique.filter.names, diffs=2, align=needleman)* *

The needleman algorithm penalizes the same amount for opening and extending a gap. Alternatively, you could use the gotoh algorithm, which charges a different penalty for opening (default=-2) and extending (default=-1) gaps:

* *mothur > pre.cluster(fasta=sogin.unique.filter.unique.fasta, name=sogin.unique.filter.names, diffs=2, align=gotoh)* *

Our experience has shown that the added parameters in the gotoh algorithm do not improve the pairwise alignment and increases the time required for the alignment"

I am running a large and diverse dataset of COI (for which there is no aligned reference available - and this is plankton, so about 15 animal phyla), and I was wondering how to be able to get to a clustering point. Is cluster.split the way, with unaligned, and do not run pre.cluster?

Thanks,

Leo

pschloss · November 28, 2023, 6:10pm

Hi Leo,

Even if you skip pre.cluster you will need to have some type of alignment to calculate pairwise distances in cluster.split. You migth check out pairwise.seqs which is optimized a bit for speeding up the pairwise distance calculations

Pat

leocadio · November 28, 2023, 6:32pm

Hi Pat

But, in cluster.split it indicates:

Mothur can cluster both aligned and unaligned sequences using the cluster.split command. The splitting process will use the dist.seqs command to calculate the distance files for aligned reads, and pairwise.seqs to calculate the distance matrices from unaligned reads

It autodetects they are not aligned?

So, I assume that this will be much faster than doing pairwise.seqs with all the sequences against all the sequences. Am I right?

Leo

pschloss · November 30, 2023, 5:12pm

It may or may not be faster. pairwise.seqs allows you to use kmers to reduce the number of comparisons that are being made.

Pat

system · December 10, 2023, 5:13pm

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
pre.cluster and aligned vs. unaligned sequences Commands in mothur	1	3178	November 30, 2010
Questions on the pre.cluster command Commands in mothur	1	2793	July 30, 2012
preclustering Commands in mothur	1	2281	June 27, 2013
pre.cluster is very slow Commands in mothur	4	1215	September 13, 2017
Analysing fungal ITS with the pre.cluster function Commands in mothur	10	5750	July 19, 2016

Pre.cluster in unaligned, or cluster split?

Related topics