I have some questions regarding the theory behind unifrac in mothur. For my samples, I have followed the Miseq OTU-based analyses and run unifrac with the tree generated with the tree-shared command as an input. I understand that the distance file that is used as the input for the tree.shared command is generated by calculating the distances between samples based on OTU presence/absence, or relative abundance (depending on the calculator you pick), and not by looking at sequence similarities.
However, talking to a colleague today, he seemed confused about the mothur approach since he mentioned that the original idea behind Unifrac was to take a phylogenetic tree containing all the reads, and a group file mapping read to group, and determine whether the groups are evenly spread across the tree or not, taking into account hierarchies. In his opinion, the tree we are reading into the Unifrac command in mothur, however, is a dendrogram describing similarity between samples.
I wonder if anybody has run into this idea before. It might have a very simple explanation but we don’t seem to figure it out.