I have encountered an interesting discrepancy between the results of the dendrogram and ordination approaches. If we use the same calculators (jclass and thetayc) to make the distance matrices for the ordination approaches (dist.shared) as we use to make the tree.shared dissimilarity UPGMA dendrograms - should we expect the results to be congruent?
In my case I am finding that the dendrograms find different clustering (supported by amova p<0.001) than the ordination plots (supported by amova p<0.001). How can we determine which approach to use if they seem to find different clustering that is highly supported?
Let me be more explicit about the conflict - I have four samples. Three water samples (W1,W2,W3) and one sediment sample (Sed). With tree.shared I get highly supported clustering of Sed-W2-W3 at the genus through phylum levels for both the jclass and thetayc calculators. In all dendrograms W1 is clustered separately from the others. From this I concluded that the structure and richness of the microbial community in sample W1 was distinct from all other samples.
Next, I used ordination approaches because I had hoped to make a biplot showing which phylotypes caused W1 to be so distinct from the others samples. However, to my surprise, the pcoa plots consistently clustered the water samples to the exclusion of the sediment sample. W1 was no longer distinct.
I am at a loss for how to interpret this discrepancy. Any suggestions would be greatly appreciated.
So first off, remember that ordination and clustering methods are data visualization tools. AMOVA is independent of those tools and uses the same distance matrix as is used to run ordination and clustering. Also, I’m not sure how you can get meaningful results from AMOVA if you have 1 sample from a group (i.e. Sed).
While I wouldn’t expect ordination and clustering to provide the same output, there are several possible explanations. First, if the tree isn’t depicted as being rooted and you only have one Sed sample, you could possibly get this result. Second, if a 3rd axis is meaningful, it is possible that the difference is in that axis and you don’t see it in 2d but do in the dendrogram. Third, it’s possible that it’s just a difference in algorithms.
Hope this helps…
I looked into the suggestions you proposed.
- When you mentioned that AMOVA may not be meaningful with a single sample in a group - do you mean that the calculation of Sum of Squares within a group will not work correctly? Specifically, the dij value will be 0 since the distance between samples (Sed-Sed) is 0? If that is the case, do you have any other suggestions for how I might compare this type of data, (e.g. Sed vs W1-w2-w3)? I believe that LIBSHUFF should be appropriate since it does pairwise comparisons. However, ANOSIM is probably not since it also compares distances within a group to between groups like AMOVA.
- The trees are depicted as rooted so that probably isn’t the issue.
- The 3rd axes of the pcoa generally have loadings representing less than 1% of the variance in the data - not very meaningful.
- I misspoke in my last post. The disagreement was actually between a dendrogram found to have significant clustering using weighted unifrac and a pcoa found to have significant clustering using amova. I did not use amova to test the significance of clustering in the tree.shared output because I was unable to generate a phylip formatted distance matrix.(FYI: the tree.shared manual page is missing information about how to use the phylip flag)
- I went back and confirmed that when the phylip formatted distance matrix (generated from dist.shared) is used to generate dendrograms it actually creates the same clustering as the pcoa plots. This suggests to me that the reason the clustering and ordination approaches produce different results is due to the differences between the tree.shared algorithm and the dist.shared algorithm when operated on the same input file (sample.tx.shared).
Alas, that brings me back to the same place of having to decide whether the clustering or ordination methods is more appropriate for my data. It doesn’t really make sense to use ordination if I don’t have a method of determining whether the clustering is significant. Clustering methods it is then.
Thanks for your suggestions,
I’m doing ordination analysis for first time.
I have two conditions(control and diseased) and have total 8 samples
I did dbRDA(distance based RDA)
first transformed the data using hellinger transformation.
dbRDA <- using capscale function on R with bray curtis distance method.
I want to plot it using evnviromental factor(which is just one)
How can I do it?
Please suggest me if anyone has done same kind of analysis