Checkpoint in chimera.perseus (to pick up where it left off)

I use chimera.perseus for chimera checking. The process can be quite lengthy depending on dataset, i.e. more than 24 hours.

A couple of times it has been stopped prematurely by time limits on our servers or network issues causing a disconnect. It is frustrating because then the whole shebang has to start over.

As it appears to look for chimeras sample-by-sample (in alphabetical order), presumably appending them to the .chimeras and .accnos files as it goes, is there any way to introduce a checkpoint so that if it needs to be stopped/resumed, it can pick up from where it left off?

Last time it fell over, the last updated temp files in the directory were called “stability.trim.contigs.good.unique.good.filter.unique.precluster.perseus.chimerasPBRLEE” and “stability.trim.contigs.good.unique.good.filter.unique.precluster.perseus.accnosPBRLEE”, PBRLEE being the sample that was underway when the job aborted. But I couldn’t find any instruction for picking up the process if it was dropped.

I think this would be a very useful feature for anyone, like me, who may suffer premature job death on their servers. Or if someone needed to stop the chimera checking and resume it later.



Sorry, this is likely to be pretty low on the list of features to incorporate unless we find a good reason to move to chimera.perseus over chimera.uchime. The perseus algorithm can be pretty demanding computationally and doesn’t seem to be any better than uchime and requires a training set to run.