Label in dist.seqs OR read.dist?

jarrod_s · January 31, 2010, 9:41pm

Hi all - I was hoping someone had some advise.

Dataset: 100,000 unique, screened, filtered, and unique (again) sequences
Machine: Mac OS 10.5.8 2.4GHz Intel Core Duo 4GB 667 MHz DDR2 SDRAM
Issue: I am interested in looking at distances up to and including 0.10. Problem is that I get a 6.11Gb distance file with the cutoff set at 0.10. So when I perform read.dist I get the following:

mothur(48293) malloc: *** mmap(size=2097152) failed (error code=12)
*** error: can’t allocate region
*** set a breakpoint in malloc_error_break to debug

I am really only interested in a few distances (unique to 0.03, 0.05, and 0.10). Is there a way to calculate distances at specified values? If not, can anyone suggest a workaround?

Thanks
jarrod

westcott · February 1, 2010, 12:27pm

You are running out of RAM. Are you trying to read the distance file so you can run the cluster command? If so, you should try the hcluster command. It is uses much less RAM. http://www.mothur.org/wiki/Hcluster

jarrod_s · February 1, 2010, 3:50pm

Ah! I thought the hcluster command could only be run after a read.dist. So am I correct that the only advantage to using the cluster instead of hcluster is that the former implements three (3) clustering methods while the later only uses furthest neighbor?

pschloss · February 1, 2010, 8:54pm

Not exactly… The next release will actually have all three methods for hcluster as well. We are noticing some performance issues that separate the two methods. First, with hcluster the matrix needs to be sorted. So this can take a long time. But, at least for fn, once this is done hcluster can be faster than cluster. The an method in hcluster is quite slow and can cause memory problems.

Hope this helps,
Pat

lamnguyen · August 28, 2014, 7:27am

Dear pschloss.
My probem is the same, I perform hcluster with file dist (146GB), and I’m really interested a few label (0.01, 0.03, 0.05 and 0.1) for my paper. Is there a way to cluster at specified value instead waiting hcluster perform 0.01, 0.02,0.03…0.1 label respectively.
Many thanks for help.

pschloss · August 28, 2014, 5:55pm

As I said in the other post, don’t use hcluster.

As for the theme of your question, these are hierarchical algorithms and you have to go through 0.01 to get to 0.03.

Pat

lamnguyen · September 3, 2014, 2:51am

Thank Pschloss
I dont know hierarchical algorithms that it can solve my problems. Would you like suggest some algorithms for me.
Thanks

pschloss · September 10, 2014, 5:49pm

The default works best - average neighbor

Topic		Replies	Views
read.dist aborting before finishing Commands in mothur	6	5214	July 7, 2010
hcluster, average neighbor, large distance matrix mothur bugs	1	4232	June 29, 2011
Problem with hcluster? mothur bugs	2	55001	December 4, 2009
Computer Issues with hcluster Commands in mothur	2	2956	May 24, 2011
Segmentation fault when clustering a 1.44 GB dist file mothur bugs	5	135479	November 14, 2009

Label in dist.seqs OR read.dist?

Related topics