Hello, I have been using clearcut to make phylogenic trees for unifrac analysis for a while now, but since moving most of my work to our university’s server I have not been able to get the memory usage right. I have been getting a bus error that states “line 14: 22409 Bus error (core dumped)”. I am pretty sure this means my job is getting killed due to exceeding the requested memory. I was just curious if you know how much memory I would need to run the clearcut command.
On a somewhat related note, this issue has brought up a discussion in my lab about whether or not we should be creating our own phylogenic trees from our data or be using peer-reviewed phylogenic trees (like from RDP) for unifrac calculations. Any thoughts on that would be appreciated as well.
I suspect you are running out of RAM. You might try sing a single core and using as much RAM as you can get from the cluster.
I would discourage the use of reference trees where you map your sequences to the reference tree. This uses closed reference clustering, which we have shown to have big problems using current algorithms. See here and here.
Finally, I’d encourage you to perhaps take a step back and ask why you need to pursue phylogenetic approaches. I have yet to see a case where an OTU-based beta-diversity analysis conflicted with a phylogenetic one when controlling for whether the metric is measuring membership (e.g. unweighted unifrac, jaccard) or structure (e.g. weighted unifrac, bray-curtis). Given all the extra challenges of building a meaningful tree for your sequences, I’d go with the OTU-based methods.