Hello mothur team,
We were following the Costello Analysis for one our datasets and after the pre.cluster step we got about 120,000 sequences. Dist.seqs created a distance matrix that has 17 million lines and is 160 GB. We tried to run the cluster command. It depleted the memory so much that we were afraid the computer would shut off (it’s happened before), so we aborted the command. Since hcluster should have a smaller memory footprint, we decided to try that. However, the next morning the computer had an error message that said “not enough disc spaceâ€. The mothur log said, “It took 70844 seconds to sort.†We’re running this on a MacPro with 2 dual core processors and 9 GB of RAM. We’re using mothur 19.
We’re wondering what specifications are needed to run a dataset of this magnitude or larger (possibly 4-5 times larger). Any suggestions would be appreciated.
Bonnie and Diane