average neighbor clustering problem

Hello,
I am trying to cluster my sequences into OTUs using the average neighbor algorithm and it won’t cluster below the unique level.

I keep getting the message “cutoff changed to 0”. If I set the cutoff to something (e.g., cutoff=0.10) it still gives me the error message.

The furthest neighbor method works fine and clusters the OTUs from unique to 0.18.

I have a column formatted distance matrix that was generated using ESPRIT.

Thanks for your help,
Jana

Jana,

  1. This is probably happening because you have large distances between groups that aren’t being captured in the distances below the cutoff. Try 0.20 or something higher. Alternatively, you should probably double check that your sequences overlap in the same region since non-overlapping sequences will create distances of infinity and screw things up in general.

  2. I’m not sure why you would mix and match ESPRIT and mothur. I’d strongly encourage you to do everything within mothur (e.g. sequence trimming, alignment, chimera checking, distance calculations, and clustering). I’m of course biased, but the two methods are fundamentally different.

Hi Pat,
Thanks for your reply.

I don’t think this error is happening because I have large distances between groups. When I calculated the distances in ESPRIT I used a kmer distance cutoff of 0.5, (which roughly corresponds to a pairwise distance of ~0.18) to speed up the computation. This gets rid of the problem you mention about non-overlapping sequences and infinite distances.

BTW, I tried setting the cutoff to 0.20 and I got the same error message as before.

The reason I had to use ESPRIT was because I have 6000 ITS-LSU sequences that are ~ 1000 bp in length. These sequences were trimmed using phred/phrap and manually edited for errors. Because they are too divergent to align them reliably (they span the Ascomycota), I wanted to generate a distance matrix based on pairwise alignments and it was too many sequences for Mothur to handle. So I only used the ESPRIT kmer distance computation followed by the needledist computation to generate the distance matrix on our university computer cluster. I didn’t use their hcluster algorithm.

Thanks for your help.

Best,
Jana

Hmm. I understand where you’re coming from. That is weird. Could you send us the distance matrix and names file to take a look for you? Have you tried the paiwise.seqs command which does essentially what esprit is doing without the kmer optimization?

Hi Pat,
I tried the pairwise.seqs option on my laptop with 4 processors and it was taking a really long time, which is why I used esprit. I didn’t try to run it on the cluster though.

My distance matrix is 50 mb when zipped, which is too large to email. How can I get it to you?

Thanks!
Jana

can you post it to the wiki and email me the link?

I made a google doc for the distance matrix and shared it with you, and I emailed you the .names file.

Thanks!
Jana