Are sequences needed to be subsampled to the same sequencing depth befor estimating alpha-diversity?
Is that right that communities with different number of sequences can not be compared?
I used mothur commands (rarefaction.single and rarefaction.shared) to generate the rarefaction curve, and used commands (summary.single/summary.shared) to generate the sobs-shannon-chao1 values. I wonder to know if I have to unify the reads number of all my samples (groups) first, then I can use the data to calculate the sobs-shannon-chao1 values? If yes, why?
Any suggestion will be greatly appreciated!
That is correct - many of the alpha (and beta) diversity metrics are highly sensitive to sampling effort. So if you have different numbers of reads in each sample, you need to rarefy the samples to a common number of sequences. We demonstrate this in the various SOPs in the summary.single and dist.shared commands.
Could I ask a follow up question?
So, if we have a number of samples with different number of sequences, we should calculate the alpha diversity based upon the sequences assigned to each OTU after rarefication? Then generate the OTUs from the rarefied samples (based on the sequences) to estimate the beta diversity?
Would that be correct? I am a bit lost in regards to the specific files to use to generate these stats.
So let’s assume you generated a shared file with make.shared called test.shared. In it are 100 samples with varying numbers of sequences. Let’s assume the minimum number of sequences in any sample was 2042. To get the number of OTUs and Shannon diversity you would do…
summary.single(shared=test.shared, calc=sobs-shannon, subsample=2042)
To get beta diversity (like a distance matrix) you would do…
dist.shared(shared=test.shared, calc=thetayc-jclass, subsample=2042)
You would then use the data in the ave.dist file
Thank you very much. Like I said, i was a bit confused . . .