cluster and phylip and name file

Hello,

I have analysed appr. 16000 sequences of enzyme coding gene amplicons. I have preprocessed the seqs in mothur (trim.seqs and so on) and then reduced the data and formed a names file using unique.seqs.

Thereafter I have translated the unique seqs, and aligned the amino acid sequences using clustalOmega. Along the process I have removed badly-aligned sequences from my fasta file but also from the name and group files. Thereafter I have calculated the pairwise distance matrices using Mega and produced a low-triangel matrice using Excel.

Now, I want to get on with clustering and thereafter to OTU based analyses (rarefaction…). In mothur wiki it says that these kind of phylip-distance matrices should not be clustered with names file. Only column formatted matrices should be clustered with names file.

However, I am now trying cluster command with phylip and names files but the mothur has jammed. Or does it just take a long time?

I understand that if I cluster the seqs just using phylip without names file, mothur clusters the unique.seqs and does not have information about the actual number of seqs in each OTU. And thereafter all the OTU based analyses are just based on unique seqs.

How should I continue?

Antti Rissanen
Finland

Now, I want to get on with clustering and thereafter to OTU based analyses (rarefaction…). In mothur wiki it says that these kind of phylip-distance matrices should not be clustered with names file. Only column formatted matrices should be clustered with names file.

Not true… You can run cluster.seqs(phylip=…, name=…, cutoff=…). Depending on your computer, this may take a while.

Hi,

I noticed that also. I suppose the problem was with my low-triangle matrice. I probably made some mistake in formatting it in excel.

When I used cluster command with square-matrice (output of clustalX distance calculations) clustering went on very quickly.

Now I will start to do OTU-analyses. Thanks.