Computer Issues with hcluster

Hello mothur team,

We were following the Costello Analysis for one our datasets and after the pre.cluster step we got about 120,000 sequences. Dist.seqs created a distance matrix that has 17 million lines and is 160 GB. We tried to run the cluster command. It depleted the memory so much that we were afraid the computer would shut off (it’s happened before), so we aborted the command. Since hcluster should have a smaller memory footprint, we decided to try that. However, the next morning the computer had an error message that said “not enough disc space”. The mothur log said, “It took 70844 seconds to sort.” We’re running this on a MacPro with 2 dual core processors and 9 GB of RAM. We’re using mothur 19.

We’re wondering what specifications are needed to run a dataset of this magnitude or larger (possibly 4-5 times larger). Any suggestions would be appreciated.

Bonnie and Diane

Are you sure that you are using the quality trimming and alignment filtering? One option is to try the cluster.split command. Classify all of your sequences that came out of pre.cluster and then run cluster.split using a taxonomic level of 2 or 3 and see how that works.

Thanks, that helped. We had already done quality trimming, alignment, chimera checking, etc using the same method as the Costello analysis. We’ll move on from here and see what happens. :slight_smile: