Hi, I’m a little confused by the documentation for otu.association. What is being correlated- presence of OTUs or abundance of reads per OTU per library?

The abundance of reads per OTU per library. Let’s look at the example below:

Otu001 Otu002
F003D000 422 413
F003D002 1012 186
F003D004 948 439
F003D006 967 468
F003D008 350 316
F003D142 744 839
F003D144 488 609
F003D146 470 621
F003D148 414 526
F003D150 492 707
MOCK.GQY1XT001 0 0

X = 422,1012,948,967,350,744,488,470,414,492,0
Y = 413,186,439,468,316,839,609,621,526,707,0

OK, so the purpose of this test is to find OTUs that change in abundance across the libraries in a similar way?

It’s basically doing a correlation analysis between all OTUs across the groups. Think of it as dist.shared, but instead of using the rows of the shared file, it’s using the columns and instead of theatyc, jclass, etc it’s using pearson, spearman or kendall. Other than that, it’s the same :slight_smile:

OK, this might be a stupid question as follow-up, but if I have OTUs from different kingdoms (or different marker genes), can I explore how they co-occur across my samples using otu.association? Also, can I take environmental data and include that in a column as an “OTU” and test it, to see what may relate in abundance to specific OTUs? Or should I be using a different test? By “environmental data” I mean physicochemical measurements relating to the environments my samples are from.

Hmmm… No, but it will be.

Check that - there is a metadata option in otu.association. You can upload a metadata file and it will look at the correlations between the OTUs and the metadata.


Great, I did not see that earlier for some reason. Thanks!