clearcut running 16+ days

What should the expected run time for clearcut even be?

Input was:

Output File Names:
run066_09122016.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.phylip.dist

It took 15934 seconds to calculate the distances for 230005 sequences.

Is it even possible to do a phylogenetic based analysis on this nr of sequences?
If the nr of sequences is a problem, does it make sense to subsample before then recalculate the dist matrix and run clearcut?

I never managed to get ~300k sequences to clearcut on a machine with 256Gb ram. I can’t remember how long I let it run, probably less than 2 weeks because I couldn’t justifying more machine time for an operation that I didn’t know would work. If you have a cluster where they’ll let you run a node for a very long time, maybe you could walk away and come back in a while and see if it finished. I believe a significant part of the problem when doing v4 is not just the number of sequences but the fact that there isn’t much difference between the sequences.

You could cluster the 5% otus rather than the 3%, that should drop your number significantly

Thanks for the info.

It’s been running or idling for ~3 weeks now… I had 48 cores enabled with basically as much as memory needed (up to 500Gb). In the beginning I think I’ve seen multiple processes being run, but now there’s just one process and it’s barely using 100Gb. There’s no output so I can’t really say if the process is actually running/progressing or just hanging somewhere.

I think I’ll try to restart it and also use the verbose option to see what is really going on…

PS. I don’t see the “processors” parameter for this command. Is parallelization even possible?

It isn’t parallelized.

Restarted. Running 20 days now. Idling again on 100Gb mem.

The verbose output is also very “helpful”

PRNG SEED: 1483975151

Oh well. I guess it’s not doable or whatever.

FWIW, I have yet to see a case where results from a phylogenetic method didn’t match those from an OTU-based method.

dnasaurus, did clearcut ever finish running? I am running into a similar predicament and was wondering how long it took for yours to run. I have 95K sequences (.dist file is 31.6Gb) and clearcut has been running idly for 8 days now.

Will running it on a server with more cores/RAM speed this up at all?

No, it never finished. I killed it after a few more days since there was no indication as to what was going on (if anything was going on at all). Sorry.