Hi all,
I have a large data set with 140,000 sequences or so after post-production removal of chimeras and lower quality sequences. I’m able to build a distance matrix, though it is 63 gb. I’m trying to build a tree using clearcut through mothur, but I have a suspicion it’s frozen my computer. After a day of running, it seemed to have maxed out my memory (32 GB with 9 GB swap all used). It’s been running for over 2 weeks now. Currently, it seems to use about 10% of one processor (out of 8), though the mothur prompt is still blinking. It hasn’t given me any error messages, but I think it’s frozen.
I’m wondering if there is an alternate way to analyze this data set and build a tree, or am I just out of luck since I don’t have access to a computer with more resources? Is there a way to have clearcut use hard drive space instead of memory like the hcluster command when building a distance matrix? Is there another tree-building program anyone can recommend that might be able to handle a data set this large? I’ve used fasttree a year ago, but haven’t used it since I switched to a Linux machine since Clearcut was supported in Mothur. Any suggestions or advice would be appreciated. Please let me know if you need further information. Thanks!
-Damon