Label in dist.seqs OR read.dist?

Hi all - I was hoping someone had some advise.

Dataset: 100,000 unique, screened, filtered, and unique (again) sequences
Machine: Mac OS 10.5.8 2.4GHz Intel Core Duo 4GB 667 MHz DDR2 SDRAM
Issue: I am interested in looking at distances up to and including 0.10. Problem is that I get a 6.11Gb distance file with the cutoff set at 0.10. So when I perform read.dist I get the following:

mothur(48293) malloc: *** mmap(size=2097152) failed (error code=12)
*** error: can’t allocate region
*** set a breakpoint in malloc_error_break to debug

I am really only interested in a few distances (unique to 0.03, 0.05, and 0.10). Is there a way to calculate distances at specified values? If not, can anyone suggest a workaround?


You are running out of RAM. Are you trying to read the distance file so you can run the cluster command? If so, you should try the hcluster command. It is uses much less RAM.

Ah! I thought the hcluster command could only be run after a read.dist. So am I correct that the only advantage to using the cluster instead of hcluster is that the former implements three (3) clustering methods while the later only uses furthest neighbor?

Not exactly… The next release will actually have all three methods for hcluster as well. We are noticing some performance issues that separate the two methods. First, with hcluster the matrix needs to be sorted. So this can take a long time. But, at least for fn, once this is done hcluster can be faster than cluster. The an method in hcluster is quite slow and can cause memory problems.

Hope this helps,

Dear pschloss.
My probem is the same, I perform hcluster with file dist (146GB), and I’m really interested a few label (0.01, 0.03, 0.05 and 0.1) for my paper. Is there a way to cluster at specified value instead waiting hcluster perform 0.01, 0.02,0.03…0.1 label respectively.
Many thanks for help.

As I said in the other post, don’t use hcluster.

As for the theme of your question, these are hierarchical algorithms and you have to go through 0.01 to get to 0.03.


Thank Pschloss
I dont know hierarchical algorithms that it can solve my problems. Would you like suggest some algorithms for me.

The default works best - average neighbor