# How can I calulate richness and rarefaction curve correctly

I guess this is a statistical question and I hope some one can answer my question. It has made me confused for long time. I would like to know what do you normally do? I want know the working rules for publication. (Reviewers won’t think it is statical bias).

I have 10 microbial samples from 10 different elevation gradients. I want to calculate the alpha diversity for each sample. Then, I want to make serveral plots.
1> the Chao1 estimate against elevation
2> Simpson estimate against elevation
3>the observed OTUs against elevation.

I hope I can find some linear relationship in these plots (no. of OTUs decay when elevation increasese).

However, I don’t know if I need to subsample my 10 samples in order to account for the difference in sampling efforts (the number of sequences obtained in each sample). My sample size vary from 1000 to 5000.

I know for beta diversity I should always do subsample. Although this is alpha diversity, I plot the richness estimate from each sample together. This is more like a beta diversity (comparison). So, I guess I need to subsample, but I don’t know if I am right? :?

My friends told me that whenever I use any richness estimate such as Chao1 and Simpson, I have to subsample (equal number of reads). For the observed species, I don’t have to do that. However, if I do subsample for Chao1 and Simpson calculation, I have to subsample for observed OTU calculation because I plot them together to compare.

The second questions is about rarefaction curve. If I want make a rarefaction curve, do I need to make equal subsample. For example, I want my x-axis to be number of reads in each sample and y-axis is the number of species at 97% simliarity.

If I don’t need to make equal sample effort, there is another question about the rarefaction curve appearance. You know, some sample’s curve will end at 1000 reads and the other will end at 5000 reads. If I plot them together in one plot, the rarefaction curve will appear ugly. Can I just take a median of my sample size such as 3000 reads to plot the rarefaction curve.

Please leave a messages and tell me what do you normally do. Welcome to discuss!!!

Your friend is correct. You need to subsample for everything.

If I do subsample, what is the best way to do it? I can subsample once and get the value. Also, I can subsample multiple time (e.g. 1000 times), so I can get a average +sd.

Please look at my second question for rarefaction curve. If I build a plot observed species VS no. of reads, should I also do subsample?

For alpha and beta diversity use the subsample options in summary.single and summary.shared.

I would probably make hte x-axis go to the smallest library size and use that as the upper limit for the y-axis as well.