NMDS+CorrAxes vs CCA ???

I’m not experienced in statistics, and trying to get the best from my data, I have a question about which option is the best and if both are correct for community analysis based on 16S sequences.

  • One option is to plot the ordination (NMDS or PCoA) and as a biplot include the vectors from the correlation of the environmental variables and also the OTUs to see which OTUs are responsible for the differential structure of the communities, isn’t it? And in this case, should I plot all the vectors for the OTUs or how is possible to select for plotting only those being statistically significant as responsibles for the shifting of the samples along the axes?

  • And other option would be Canonical Correspondence Analysis, even with the possibility to make a triplot with the ordination of samples, OTUs and the vectors from environmental variables. Is that correct? But in this case I won’t have any probabilities to decide which of the OTUs are more responsible for the ordination of the samples

I appreciate some help here!
Thanks to all!

Since this post has gone unanswered a while I’ll take a crack at it, although I’m not completely confident in my understanding of how CCA works.

I would recommend you do the first option (NMDS or PCoA) over CCA. CCA is really for when you’re trying to discard the influence of certain variables in your system, but it seems like you’re more interested in plotting the data as it is and then looking for the biggest influences. Between NMDS and PCoA it doesn’t matter too much which you use, if you can get a good NMDS then I’d say that’s ‘better’ for answering your question, but it’s not anything to stress over if you use PCoA.

The output of the corr.axes() command gives you a p-value for each vector, you’ll just need to filter this output to get the ones that are significant. Usually I do this in R, first plotting the results of an NMDS or PCoA then overlaying the significant (and if there are too many to read easily, the longest) vectors using a different format.

Thank you very much for your explanation!! You convinced me, thinking over it now I fully agree with you :smiley:

This one is old, but I’ll jump in too. PCoA is only appropriate if the underlying gradient that is shifting your communities is linear, which is pretty much never the case for natural communities. PCoA was developed in the 60’s because the computational power to run NMS was expensive. The resurgence of PCoA is linked to lazy stats and coding (the algorithm is the same as for PCA but with a non-euclidian distance measure).

However, I agree with the explanation that CCA is appropriate if you are trying to remove the effect of a dominant variable. Don’t fall into the logical trap of thinking that a CCA “proves” your hypothesis that X is important because the CCA shows that X separates your samples-of course on the CCA X separates your samples, you told it to only consider the variability that is explained by X

Thank you for the explanations! Could you please explain me better how is that of CCA helping to remove the effect of dominant variables? When I did CCA, I got the ordination of my samples (that was in agreemen with what I already know about the samples, it was logical) and the OTUs that most likely were responsible for the big separations and also the two environmental variables supported what I thought according to what I know from the environment I am working with.
Then I did NMDS and the ordination was not convincing me, PCoA gave a more reliable ordination but the corr.axes with OTUs pointed some OTUs as responsible for separation of some samples that is not reflected when I inspect the abundance of that OTU in each sample. In my case, CCA depicted better what I more or less expected from the environment, but I´d like to show PCoA and corr.axes in my publication instead of CCA. I am working with relatively few samples from a diesel spill in antarctica and I know the place and which samples were impacted and the history, so I more or less have “informal” metadata from having being there during the sampling and so on.
Is that possible that an OTU that CCA recongizes as relevant for separation of highly contaminated soils and that also the stacke bar plots of relative abundances of the most abundant OTUs shows clearly this, is not recognized by corr.axes and PCoA?
Thank you for your thoughts on this!


NMS is a good reality check of your assumptions because it does the best job of preserving the original non-parametric relationships between samples. So if you only see the “good” ordination using CCA then the effect that you want to see is pretty weak. Ordinations aren’t hypothesis testing though, if you want to say these samples are similar to each other and different from those samples you need to test that with amova/MRPP/Permanova. Your best bet if you don’t know the stats is to find someone who does who is willing to look at your data and results with you.

(sorry won’t discuss PCoA-if your data are linear use PCA, if not use NMS. If you see something in PCoA that you don’t see in NMS, I’m willing to bet a beer that it’s an artifact of violating the linearity assumption)

Thank you for your explanation! At this point I think I don´t understand as I thought what linear or non-linear values are or even a parametric or non-parametric nature of sample comparison is. I cannot do the statistic tests you mention because particularly these data I am working with now come from 8 clone libraries and not enough replicas for those tests. The ordination is between this 8 samples and the correlation is with the hydrocarbon content in the soils those communities belong to.
CCA better found the OTUs characterizing the sepparation according to hydrocarbon content, and not so good was it when I used PCoA and spearman correlation of OTUs and the PCoA axes. NMDS gave an ordination that doesn´t show the separation according to hydrocarbon content, but the communities are clearly shifted, especially in one very contaminated soil.

I will try to find some basic “for dummies” explanation of CCA, NMDS, PCoA and linear vs non-linear data and try to get a better understanding of my data :slight_smile: