Unifrac with unifrac distance


Please excuse me for asking what I guess are elemental questions but I’m new to this kind of analyses. Anyway, here it goes:

1.) I would like to run a unifrac significance test with unifrac distance. I have replicate data in 6 groups (3 replicates /group). I know I can incorporate replicates using a design file but if I want to use the unifrac distance I would need to do the following I guess:


unifrac.weighted(tree=muscle.pick.good.filter.pick.subsample.phylip.tre, group=seqs.pick.good.pick.subsample.groups, random=T, name=muscle.pick.good.pick.subsample.names, distance=lt, processors=4)


unifrac.weighted(tree=muscle.pick.good.filter.pick.subsample.phylip.tre1.weighted.phylip.tre, group=design.design, groups=all, random=T)

Is this correct or should I do it some other way?

2.) What is the rationale between running and unweighted unifrac test on a tree constructed with Yue & Clayton distances like in the 454 SOP? As I have understood, the ThetaYC accounts for OTU abundance and the unweighted.unifrac does not, but i might be wrong.

3.) I have seen some different opinions regarding wheather rarefation should be performed or not. In the mothur tutorials I see it being implemented. Is this still advisble from your point of view? I lose quite a lot of samples this way and Im working with Sanger data (~300 sequences).

Your advice would be greatly appreciated.

Thanks in advance

Hi there…

First, it’s important to note that Unifrac can be used to calculate distances between samples or to test whether communities have different structures or memberships. In the former case you tree would be a tree of sequences and in the latter a tree of samples. You can do the hypothesis test on any set of samples that you’ve put into a tree (jaccard, thetayc, bray-cutis, unifrac, etc). We had an ISMEJ paper a few years back that compared the different methods of comparing communities based on samples. I’d encourage people to use tools like amova or homova to compare sets of samples since it is clearer what the hypothesis tests are testing and because you don’t have to run the data through an additional filter like a tree.

Second, I would not suggest using unweighted unifrac or any of the OTU-based methods that are based on presence-absence (i.e. membership) data. These methods are highly sensitive to under sampling of communities.

Third, yes, you must rarefy your data to generate a distance matrix between samples. All of the methods are sensitive (albeit to varying degrees) to uneven sequencing coverage.

Hope this helps,

Thanks Pat!

Your reply was most helpful!