I was running a metadata correlation test using the corr.axes command. I have a small number of samples(<10) in my dataset. There are obvious and direct clusters in my data, so I assumed that the correlation analysis would work perfectly.
When using the spearman coefficient, my results were not significant and were oddly similar between environmental variables. (The lengths and p-values were identical. While my environmental variables follow similar patterns, they are by no means identical)
When using the pearson coefficient, my results differed quite a bit between between environmental variables; the lengths and p-values were very different, and some were significant and some were not. (As expected)
What are the differences between the two? Is there rationale for using one and not the other? Any light which could be shed on this issue would be helpful.
Thanks for your time.
The basic difference is that the Pearson coefficient is a measure of linear correlation, Spearman is monotonic correlation. A monotonic relationship where the response variable moves in a single direction, but not in a linear fashion, for example, y = x^2.
For picking between the two, I think the general wisdom (someone may correct me here) is that Pearson makes more sense when your units of measurement are intervals (temperature, distance, etc.). Because Spearman correlation is calculated using ranks it works better for systems where your unit of measurement is something where the value isn’t directly comparable to other values. The best example I can think of is sometimes in conservation/ecology studies people will investigate sites/habitats and assign them an arbitrary score of how impacted they are by human activity. They use a score like 1 = no impact, 5 = completely urbanised. Although it’s clear within the scale that a site with 5 is ‘more impacted’ than a site with 4, it’s not clear how much more impacted it is. In those situations a Spearman correlation makes more sense.