# A question about the NA's when fenerating the rarefraction curves

Dear all,

Good night! I am really a newbiee to Mothur, and here I want to ask a question about the rarefraction curves. Thanks.

After running the rarefraction.single command, I got the rarefraction file, which looks like this (fake data for demostration):

++A = # of sequences sampled; ++B = # of OUTs identified for sample 1; ++C = # of OTUs identified for sample 2

++A ++B ++C
001 001 001
100 30 50
200 90 110
300 130 210
400 132 320
500 132 440
600 NA 560
700 NA 560
800 NA 560
900 NA NA
1,000 NA NA

First, based on my understanding, those NA’s appear because we cannot get more OTUs when further sampling the reads. When I draw the rarefraction curve for one specific sample, can I replace those NA’s with the maximium number of OTUs for this sample? For instance, when drawing the rarefraction curve for sample 1, can I use the following data?

++A = # of sequences sampled; ++B = # of OUTs identified for sample 1;

++A ++B
001 001
100 30 90 110
300 130
400 132
500 132
600 132
700 132
800 132
900 132
1,000 132

Here, instead of drawing the rarefraction curve with the x-axis goes to 500, I really want it goes to 1,000.

Second, in my data, sample 1 and sample 2 were take from the same location (they are similar to each other) but were sequenced individually. Now, if I want to treat sample 1 and sample 2 as a single sample, can I take the average number of OTUs and generate the farefraction curve this way:

++A = # of sequences sampled; ++B = # of OUTs identified for sample 1; ++C = # of OTUs identified for sample 2; ++D = Average # of OTUs for sample 1 and sample 2

++A ++B ++C ++D
001 001 001 001
100 30 50 40
200 90 110 100
300 130 210 170
400 132 320 226
500 132 440 286
600 132 560 346
700 132 560 346
800 132 560 346
900 132 560 346
1,000 132 560 346

Thanks a lot for any thoughts and input. And I hope that everyone will have a wonderful weekend.

Yours,
Zhang Chiqian

No, I would not replace the NAs with the maximum number of OTUs to that point. The idea of rarefaction is that you are subsampling downwards, not upwards as your method would effectively do. If you are plotting, you can remove those rows that have NAs for that sample. I describe how to do this in R within my minimal R materials section on line plots.

Hope this helps a bit,
Pat

Thanks a lot for the help! I now understand that I should remove the rows having NAs. My specific issue right now is: I have too many samples and thus too many curves/lines. In the link that you gave, it was suggested that we can reduce the number of samples using. However, I am afraid that I might lose some information by reducing the number of samples when plotting the rarefaction curves. That was why I initially wanted to calculate the average number of OTUs for a group of samples.

Do you think it is correct to calculate the average number of OTUs for a group of samples? If it is correct, how can we deal with the NAs? For the example below, sample 1 starts to get NAs after sampling 600 reads, where sample 2 starts to get NAs after sampling 900 reads. If we remove the NAs, there is no way to calculate the average number of OTUs after sampling 600 reads.

++A ++1 ++2
001 001 001
100 30 50
200 90 110
300 130 210
400 132 320
500 132 440
600 NA 560
700 NA 560
800 NA 560
900 NA NA
1,000 NA NA

Any comments and suggestions are appreciated.

I never look at rarefaction curves. Use summary.single instead with the number of sequences you want the data rarefied to. We go through this in the MiSeq SOP.

Pat

I see. Thanks for the reply. Have a nice weekend!

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.