PCoA analysis

Hey everyone,

I’m stuck with visualizing my data and I hope you can help. I did the PcoA though Mothur, and here is what I got as below. I’m not quite sue if this is acceptable or not. For most of the paper I read, their first two dimensions can represent >40% of the variation but why my data is just 7.9; 6.49? I don’t think this would give me a very good representation of the real data if just pick the first two or three. Can I trust the data or did I miss something?

I also did the NMDS analysis. but the stress level is quite high even under 3 dimensions (still 0.28). So I don’t know where go right now?

axis loading
1 7.920427
2 6.498209
3 4.973317
4 4.646727
5 3.985123
6 3.878085
7 3.302453
8 3.023176
9 2.577917
10 2.479469
11 2.371049
12 2.323850
13 2.234743

thanks a ton in advance!
Jinxin

Which distance measure did you use? Also, what is your study design, and what sort of sample replication is in your data?

thanks very much for your response! I used thetayc to calculate the distance and those are soil samples. I do have three groups with each has three samples. All the samples have been collected across 6 time points with 54 samples in total for this study. Looking forward …thanks again for your help.
Jinxin

Have you looked at the OTU distribution in your data? I don’t know exactly how well this would translate into your data, but I suspect that if you have a lot of OTUs unique to each sample this would lead to a less powerful signal in the ordination.

Do you have a heatmap of your data, or a venn diagram of the shared OTUs between samples?

I do not know if this helps but when you use the phylip file “std” and "ave, you get different results…

nmds(phylip=16SMAPAQcocc.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.agc.unique_list.thetayc.0.03.lt.std.dist, mindim=3, maxdim=3)

Lowest stress : 0.321701
R-squared for configuration: 0.347498


compared to

nmds(phylip=16SMAPAQcocc.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.agc.unique_list.thetayc.0.03.lt.ave.dist, mindim=3, maxdim=3)
Number of dimensions: 3
Lowest stress : 0.15837
R-squared for configuration: 0.843693

std is the standard deviation of your dissimilarity matricies. You want to use the average dissimilarity.High stress indicates that there’s no structure in your data. It’s actually not surprising (and good from a experiment quality standpoint) that the stress is high when you are looking at std.

Use NMS unless you know that the underlying gradient is linear (I’ve never seen a compelling case suggesting linearity for any microbial community data)