Cscore in Co-occurrence

Hi,

Sorry, I need some help checking if what I have done and my interpretations are correct.

Background: I have created co-occurrence networks for four different soil types using the most abundant bacterial OTUs from the whole data set (244 out of 44005 formed at 97%). I created the networks so that nodes=OTUs and edges represent significant Pearson’s correlation coefficients (p<0.05 for Benjamini-Hochberg corrected p values). I then came across the concept of the cscore and understand it to test whether my networks are based on random or non-random associations, therefore helping me to infer the degree of deterministic or stochastic processes on the community (Barberán, 2012, ISME).

What I have done so far: I edited my shared file to only contain the OTUs of interest and adjusted the numOTUs column to 244. I then ran the following command for each soil type which have 12 replicates each, for example:

_cooccurrence(shared=NetworkOTUsShared.txt, metric=cscore, matrixmodel=sim9, label=0.03, groups=1B1-1B2-1B3-1B4-2B1-2B2-2B4-2B5-3B1-3B2-3B3-3B4)

0.03
Initial c score: 0.478732
average metric score: 0.480671
zscore: -0.535602
standard deviation: 0.00362088
non-parametric p-value: 0.67

Output File Names:
NetworkOTUsShared.cooccurence.summary_

I chose sim9 as other papers conducting similar pipelines used fixed rows and columns and that seems to suit my data which I don’t think is a degenerate matrix.

Questions:

  • For all 4 soil types, the cscores were around 0.4. This seems a lot lower than other reported cscores (3 - 185). Do I have a very low number of checkerboards in my data sets?

  • I also want to check that p>0.05 means my networks are likely to be random associations? When I run the same command but use the original share file containing all OTUs, all my soil types return pvalues of 0, therefore meaning they are all non-random? I’m having difficulty explaining why for a soil type, all OTUs exhibit a non-random pattern but that the most abundant and significant ones that made the cut for the networks do appear to be random.

-Lastly, some pvalues are returned as 0, does this mean <0.001?

I also want to say thank you so much for everything that Mothur has provided over the years. This is my first time asking for help but about my millionth of reading, using and learning from your website and program.

Hi again,

I see quite a few people have viewed my post above, maybe in search of similar answers, so I’ll update with what I ended up doing. I used the EcoSimR package within R as per this website: ftp://cran.r-project.org/pub/R/web/packages/EcoSimR/vignettes/CoOccurrenceVignette.html
This uses presence/absence so I edited my shared file to be 1s and 0s so that it looked like below and saved as a comma separated values (csv) file (Species=OTUs, Samples=A-E):

Species A B C D E
Otu000001 1 1 0 1 1
Otu000002 1 1 1 1 1
Otu000003 1 1 1 1 1
Otu000004 0 1 0 1 1


I'm really not a statistician and not really a bioinformatician but I found this easy enough to do and being able to visualise the results gave me confidence that I was interpreting them correctly. The results were quite different from what I got from mothur but did seem to make more sense for my data. I must say though that I am very thankful for mothur, as I would probably still be trying to unpack my fastq files if it wasn't for this community.