The usage of mock dataset and suggested graphic package

Hi, Patrick and anyone who might know,

I am using the mothur for running my own data for a while, there are two questions I did not find anywhere mentioned before, if I am right:

  1. The mock control dataset is highly suggested in the SOP and many posts from forum. If I am understanding it correctly, the mock control (technique control?) here is a sequencing run for a known microbial community. I am using a 10 known microbial community as the mock group and after classify.out, I found those ten species of bacterial are the top ten identified in *.taxonomy file. However, I still got a big number of other OTU. The 11th and the following (around 800 more) species has a significant small size of OTU number than the top 10, which is good. My question is, do we need to use the size of the 11th species as a cut-off value to differentiate truly existent bacteria from sequencing noise for my real samples? The mock group in the Miseq SOP has been discarded using get.group and never been used again in the following analysis after knowing that 35 OTUs were identified from the Mock community (in reality it was 20 OTUs). Is this the only usage for the mock group data?

  2. It is about the plotting tools, as mentioned in the SOP “This command will generate file ending in *.chao and *.invsimpson for each sample, which can be plotted in your favorite graphing software package.”. What is the suggested graphing software package? Are you referring to R? I do find a R tutorial webpage at http://www.mothur.org/wiki/R_tutorial… But it only talks about the PCoA and NMDs plots. I have very little background in R. What should I come up with when I want to plot the .chao plot and so on? Using R to write the code to plot the others?

I would really appreciate your reply. Wish you a great day.
Regards.

Cheng

  1. The mock control dataset is highly suggested in the SOP and many posts from forum. If I am understanding it correctly, the mock control (technique control?) here is a sequencing run for a known microbial community. I am using a 10 known microbial community as the mock group and after classify.out, I found those ten species of bacterial are the top ten identified in *.taxonomy file. However, I still got a big number of other OTU. The 11th and the following (around 800 more) species has a significant small size of OTU number than the top 10, which is good. My question is, do we need to use the size of the 11th species as a cut-off value to differentiate truly existent bacteria from sequencing noise for my real samples? The mock group in the Miseq SOP has been discarded using get.group and never been used again in the following analysis after knowing that 35 OTUs were identified from the Mock community (in reality it was 20 OTUs). Is this the only usage for the mock group data?

No, I wouldn’t suggest doing that since it really isn’t a true community. We use the mock community to assess and report the error rates. That’s all.

  1. It is about the plotting tools, as mentioned in the SOP “This command will generate file ending in *.chao and *.invsimpson for each sample, which can be plotted in your favorite graphing software package.”. What is the suggested graphing software package? Are you referring to R? I do find a R tutorial webpage at > Redirecting…> … But it only talks about the PCoA and NMDs plots. I have very little background in R. What should I come up with when I want to plot the .chao plot and so on? Using R to write the code to plot the others?

R is a very powerful and popular programming language that is widely used for data analysis. You can use R, excel, etc. to plot the data.

Pat

Thanks a lot, Pat, this is very helpful. Cheng