I had a basis question about Unifrac as weighted and unweighted as they give me very different results. My question is, how does mothur take into account abundance for weighted analysis? Does only the sequences that are identical get grouped? How does this compare with the unweighted analysis? I made an arb tree of ALL my sequences (not in OTUs) and used that as the base for the Unifrac analysis. I did get very different results and I am trying to figure out what this means.
(my gut feeling is that the weighted is best since I am comparing very different communities, but I need more than my gut feeling to persuade my boss…)
The unweighted vs. weighted question is probably best directed towards Rob Knight and his crew. They should give different results since the unweighted only looks at the fraction of unique branch length whereas the weighted weights based on relative abundance. I’ve gotten the impression that weighted approach does do some srewey things. I’d suggest sticking with the unweighted approach.
Hope this helps,
Yes, I think I will go with unweighted (that is actually what I meant). BUT, for the weighted, because Unifrac analysis on their website asks for a abundance matrix and in mothur you don’t ask for an abundance matrix, I was wondering about the difference. I guess mothur has to group sequences somehow to do the weighted analysis?
I suppose that unifrac.weighted uses the names file to weigh the distance value according to the number replicate sequences; which brings up the question,why unifrac.unweighted has a names option? has anyone tried to run unifrac.unweighted with and without names files to see if it makes a difference?
Sorry I never followed up on this thread. The tree that you build (at least the way we do it) only uses the unique sequences. The names file brings in the redundant sequence names and the group file tells the program which group each sequence belongs to. So its useful for the weighted to give the frequency of sequences in each group. Similarly, the unweighted is measuring the fraction of branch length that is unique to a group. The names file is necessary because there could be multiple sequences represented by a single sequence and those sequences could belong to different groups. Then the branch would not be unique. Make sense?
Question to the mothur community regarding unifrac: The intro to unfirac in mothur states: “The significance of the test statistic can only indicate the probability that the communities have the same structure by chance. The value does not indicate a level of similarity.” I assume the test statisitic mentioned here is the unifrac distance. If this assumption is correct, this statement seems to contradict what is written in the unifrac website, where the unifrac distance is described as a measure of similarity between different population, which is where the name unifrac (unique fraction) comes from. Quoting the example in the tutorial, a unifrac distance value of 0.68 indicates that 1/3 of the sequences are shared between two populations.