pca running out of memory

hi guys;

I’ve been trying to use my shared files to run pca. I have 4 groups with a total of 45k otus and 3 million seqs between them and I’m running out of memory every time on a fairly well stocked PC. I asked my coworker whos done analyses on these dataset sizes before in pca with mothur and she said she never ran into this problem. Is this true? Is something wrong with my process or do I simply need to access a supercomputer terminal?

Maybe you can try grouping your OTUs by filter.shared function to simplify your dataset before PCA?

So do you really want PCA? Why not run dist.shared with an ecological distance calculator and then run that through the pcoa or nmds commands. I think there’s generally good agreement that PCA is not appropriate for OTU data.

I’ve already done both.

what is pca useful for specifically? My PI really wants me to have a pca chart for whatever reason haha.

for normally distributed data? sometimes people slip and use both interchangeably and don’t really know the difference…

thanks! so let’s say i wanted to graphically represent the similarity in terms of community composition (richness included) between groups, what would be the best way to show this?

also, I’m not a stats guy, so let’s say I take for example this pca chart:

how is the data being inputted normally distributed? Thanks for your time.

also, I’m not a stats guy, so let’s say I take for example this pca chart:

Thanks for making my point :slight_smile: This is not a “pca chart” - it’s a “PCoA chart”. The input data would need to be normally distributed and not be sparse for PCA to make sense. PCA is really a specific case of PCoA where the PCA distance matrix is essentially the correlation matrix.


i see i see. thank you so much.