I have been using mgcluster on a MacOSX with 8 processors and 8 GB of RAM. I’ve been running the following mgcluster process:
mgcluster(blast=HCM_virome_pooled_vs_itself.blastp, name=HCM_virome_pooled.names, method=furthest, cutoff=0.30)
With a blastp input file of about 1.4GB. It’s been running for about four weeks now, and rather than going faster with time, it seems to be going more slowly (it has clustered to 0.20 so far). Since this is taking forever, I’m wondering if there may be issues with the software, or whether this is simply the nature of the beast. If so, is it possible to implement a way to run mgcluster on multiple processors in parallel? I’m also wondering whether memory might be restricting this process. I’ve heard it may be possible to read the distance matrix to a file rather than to memory, and if so, is there a way to do that in mgcluster?
I’m just now wading into using mgcluster, and it seems like it could have great applications for what I’m interested in doing, but I’m trying to figure out my options regarding the speed issue.