mothur

Unifrac with groups vs count file

I ran unweighted Unifrac on my sequences with either a group file or a count file and got different values for pairwise comparisons between groups. I’m having trouble understanding why this would be? I assume you would need a names or count file for weighted Unifrac because abundances are accounted for, but I thought Unifrac unweighted did not consider abundances? Does anyone have insight on this? Is the recommendation to run unweighted Unifrac with a group file or a count file?
(I double-checked my log files and there is no mismatch between name, group, and count files)

Mothur does not use the abundances in the unweighted calculation, but the samples the unique sequences represent are needed. Without the name file mothur is unable to map the unique reads to all the samples they represents.

Consider the following:

seq1 group1
seq2 group2
seq3 group3
seq4 group2
seq5 group1
seq6 group3

seq1 seq1,seq2
seq3 seq3,seq4
seq5 seq5
seq6 seq6

In the tree seq1 represents reads from group1 and group2, but without the name file mothur would only see group1.

To find the UW value mothur finds the total branch length and the unique branch length for each pairing. Since a leaf node in the tree may represent sequences from several groups, leaving out the name file causes the unique branch total to be artificially inflated.

OMG, that is so simple and makes so much sense. Thank you!

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.