mothur

Phylogenetic methods in large dataset

Hi,
I’m doing phylogenetic method for large dataset following Miseq SOP to analyse Faith’s phylogenetic diversity (phylo.diversity command).

However, I got 539G in dist-phylip format. Now, the process of clearcut is running more than 7 days. It’s still not finish. I’m not sure that how should I do for big file in clearcut process. Could you please suggest me?

Thank you very much.

how much ram do you have?

I used ram 16G (mem total 757G. When i used processors=16, %mem used in 40%) , the step of process in below;

dist.seqs(fasta=finalforphy.fasta, output=lt, processors=16)
clearcut(phylip=finalforphy.phylip.dist)

However, we had used other server in slurm system (10 nodes, 3TB mem/node). I set memory in 1TB (72 cores), but I have only five days for running job (server usage limitation). The job did not still complete.

I believe clearcut needs as much ram as the dist matrix for each processor. But frankly, I’ve never gotten a big dataset to finish clearcut. I just don’t do it.

For what it’s worth, we’ve yet to see results from using phylogenetic methods disagree with those from bin-based methods. I would really suggest just using a bin-based method.

Pat

No. problem. I also think that using phylogenetic tree did not optimal method for large dataset. Thank you very much for your suggestion. Normally, I used bin-based method to analyse all data. However, I got some suggestion in phylogenetic method using Faith’s phylogenetic diversity which maybe give the additional insights into data ^^

For us, PD is almost exactly correlated with the number of OTUs observed in a sample

Pat

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.