Average clustering

Hello Mothur Peeps

Based on the AEM manuscript provided with the last release of Mothur I am trying to repeat an analysis I did previously using average neighbor clustering rather than furthest neighbor (FN). Everything worked beautifully with FN. I used a 0.15 distance cutoff and hcluster for clustering (about 70,000 16S sequences).

I tried the new cluster.split command and so far no go…

1st I tried to split the distance matrix (column, name) by distance. The analysis ran all the way through and I was able to make a shared file. However, when I opened it it had only made OTUs for the ‘unique’ distance

2nd. I tried to split based on taxonomy. If I use tax values of 1,2 or 3 the distance file gets split, but it generates a temp name file that fills up my computers memory. (One file was 403 GB)

If I use tax values of 4, 5, or 6, I get a message that mothur cannot open name file XXX.name.temp and it lists several files.

Currently I am trying the split.cluster with the FASTA split. I will let you know how it turns out.


Just thought I would see if anyone else is having similar problems

Could you send your files to mothur.bugs@gmail.com and I will take a look?