I would like to run unifrac.weighted to get weighted unifrac distances from my dataset. I have my count table, however I need some help figuring out what to use for my tree file.
I used Silva to align my sequences–can I use the .arb files associated with Silva for my tree file? Or do I need to use clearcut to construct a tree from my own data? I would love some assistance understanding what I should use and why.
The unifrac commands just require a newick formatted tree file, so any software that give you one will work for this command. You can create a neighbor-joining tree in mothur using the clearcut command. Another option would be to use FastTree to create a pseudo-ML tree.
As long as you use the fasta file that corresponds with your count table, the results of any tree builder should work fine.
Can I use the .arb files associated with Silva for my tree file? Or do I need to use clearcut to construct a tree from my own data?
I have been trying to use clearcut to construct a tree, however it’s been running for well over 7 days on my computer. My understanding is that FastTree is less computationally intensive, but I’m not sure if that means the quality is compromised.
I would appreciate any additional insight you can provide.
You can’t use the ARB files, but if you have a tree inside your ARB database you can export it out and use (Tree -> Tree Admin -> Export).
How many sequences are you trying to build your tree from? FastTree basically builds a BIONJ tree and then refines the tree using ML criteria, so I wouldn’t expect it to be any quicker than clearcut.
I have quite a few sequences–I think that is part of my problem at the moment. I was running clearcut based on the fasta file generated immediately before the dist.seqs step in the MiSeq SOP. It seems to me that instead I need to build a tree based on representative sequences for each OTU in my dataset. It looks like I can use get.oturep for this.
Does this seem reasonable to you? Any other suggestions? I appreciate the help!
One thing you could try to speed the process up is to run dist.seqs before clearcut. The reason for this is that dist.seqs can be split over multiple processors but clearcut sits in a single process. I’m not sure how clearcut handles a fasta input, but given your run time I’m assuming that it uses a single thread.