How it is possible to perform multivariate analysis when the number of variables (=taxa) are much higher than samples?

Hello mothur’s users!

I followed the MiSeq SOP pipeline of mothur in order to preprocess and assign taxonomy of my environmental 16S rRNA amplicon datasets (V4-5 hyper variable regions).

Now, I’m doing downstream analysis. Particularly, I’m interested to know how similar are my samples from each other. To doing so, I built a distance matrix with the unweighted UniFrac metrics in order to plot the results using the principal coordinates analysis (PCoA) method. However, some doubts arose related with the few number of samples (n=9) that comprise my own dataset.

The specific question is: ‘How it is possible to perform multivariate analysis (through PCoA) when the number of variables (unweighted UniFrac metrics, based on presence/absence of OTUs between samples) are much higher than the number of observations/samples (n=9)?’


I'm a newbie in bioinformatics and, unfortunately, I'm not good at statistics. Can anyone shed some light here! Thanks in advance. Kind regards, @renh@

first don’t do pcoa unless you know your underlying gradient is linear (I’ve yet to see a natural community that is). Use NMS instead, it only assumes monotonicity rather than linearity

second, you’re variables aren’t taxa for ordinations, they are samples. You’ve transformed your data into a dissimilarity matrix, so you have the same number of variables as you have samples.

Thank you kmitchell.
Cheers,
@renh@