Theory behind unifrac in mothur


I have some questions regarding the theory behind unifrac in mothur. For my samples, I have followed the Miseq OTU-based analyses and run unifrac with the tree generated with the tree-shared command as an input. I understand that the distance file that is used as the input for the tree.shared command is generated by calculating the distances between samples based on OTU presence/absence, or relative abundance (depending on the calculator you pick), and not by looking at sequence similarities.

However, talking to a colleague today, he seemed confused about the mothur approach since he mentioned that the original idea behind Unifrac was to take a phylogenetic tree containing all the reads, and a group file mapping read to group, and determine whether the groups are evenly spread across the tree or not, taking into account hierarchies. In his opinion, the tree we are reading into the Unifrac command in mothur, however, is a dendrogram describing similarity between samples.

I wonder if anybody has run into this idea before. It might have a very simple explanation but we don’t seem to figure it out.


I’d say your colleague is right. UniFrac distances should be calculated off a phylogenetic tree of sequence data. You can build one from your samples using the dist.seqs/clearcut combination to get an neighbor-joining tree. You can then use this as the input for your unifrac distances. I wouldn’t use the output of tree.shared as the input for a unifrac, since this function uses UPGMA to build the tree.

The unifrac command in mothur (and QIIME) just takes a valid tree file and runs the analysis on it, so you won’t see an error if you pass the output of tree.shared into the unifrac functions, but the output won’t be very informative.

Thank you for the clear response! It is a bit confusing in the Miseq SOP why they use unifrac for OTU-based analyses, then…

We use unifrac in two places in the SOP. In the OTU-based analysis, it is to analyze a tree of samples and at the end of the SOP in the phylogenetic analysis section it is to analyze a tree of sequences.