Phylogenetic methods in large dataset

I’m doing phylogenetic method for large dataset following Miseq SOP to analyse Faith’s phylogenetic diversity (phylo.diversity command).

However, I got 539G in dist-phylip format. Now, the process of clearcut is running more than 7 days. It’s still not finish. I’m not sure that how should I do for big file in clearcut process. Could you please suggest me?

how much ram do you have?

I used ram 16G (mem total 757G. When i used processors=16, %mem used in 40%) , the step of process in below;

dist.seqs(fasta=finalforphy.fasta, output=lt, processors=16)

However, we had used other server in slurm system (10 nodes, 3TB mem/node). I set memory in 1TB (72 cores), but I have only five days for running job (server usage limitation). The job did not still complete.

I believe clearcut needs as much ram as the dist matrix for each processor. But frankly, I’ve never gotten a big dataset to finish clearcut. I just don’t do it.

For what it’s worth, we’ve yet to see results from using phylogenetic methods disagree with those from bin-based methods. I would really suggest just using a bin-based method.


No. problem. I also think that using phylogenetic tree did not optimal method for large dataset. Thank you very much for your suggestion. Normally, I used bin-based method to analyse all data. However, I got some suggestion in phylogenetic method using Faith’s phylogenetic diversity which maybe give the additional insights into data ^^

For us, PD is almost exactly correlated with the number of OTUs observed in a sample


