own mock library sequences fasta file

Jinxin · February 9, 2016, 11:32pm

Hi everyone,
I know that I am asking stupid questions, but I’m confused again with your HMP_MOCK.v35.fasta in the Miseq_SOP. I’m using my own mock library.

As the SOP indicated that HMP sequenced 21 isolates in the mock library to calculate the error rates. Regarding this, I do have a couple of questions and I hope that you can help with.

Can you please let me know how did they generate this fasta file? How did they pick 16s sequence from a single species? I mean even for E.coli, there are 7 copies of 16s (rrsA-rrsE, rrsG and rrsH), which one should we chose? as far as know, those copies have different sequences. Should we grab those sequences from Genebank or silva?
I saw that in the fasta file, they do have a couple of numbers, like B. vulgatus 1, B. vulgatus 2, B. vulgatus 4, B. vulgatus 5, B. vulgatus 7, what are those numbers? Different copies in the same strain? if this is the case, why e.coli only have E.coli 1?
the SOP is working with V4, so the sequence is supposed to be 250bp in length? but Why are you putting V3-V5 sequences in this file? Did you use V3 and V5 primers to trim the 16s sequences and get this file?

I’m trying to explain where I was confused and hope it is clear. I was stuck with this step and really hope any of you will be able to help.

Jinxin

Jinxin · February 11, 2016, 11:53pm

I got the answers for my questions, and if you are working with your own mock library and have the same questions, please let me know. thanks,
Jinxin

pschloss · February 12, 2016, 3:17pm

The fasta sequences came from the organisms’ genome sequences and includes all of the rrn copies per genome. I think the version we posted is only those sequences that were unique within the V3-V5 region. The full length sequences are available at https://raw.githubusercontent.com/SchlossLab/Kozich_MiSeqSOP_AEM_2013/master/data/references/HMP_MOCK.fasta

drramganesh · April 14, 2016, 8:51am

Can you please tell me te answer , me too got the same problem…

Jinxin · May 20, 2016, 1:19am

Sorry I’m late, the best recommendation is to start with the organisms whose genome is well known. From there, you should grab all the copy numbers from your organisms and trim them to your interested region and do the alignment from there. thanks,
Jinxin

lydiajleon · May 27, 2016, 1:13pm

Hi there,
I have created my own mock community using DNA from 10 separate bacterial strains purchased from BEI resources. The website gives information regarding the HMP ID and GenBank accession numbers so I am able to get the whole genome sequence for these particular strains. However, I am now unsure how to progress to make up my own fasta file like the ones created for the standard mock communities created by BEI (http://gigadb.org/dataset/view/id/100185) to be used in the Mothur protocol for assessment of error rate. Essentially, I want to download just the 16S sequence from the whole genome but don’t know how to do this in the most efficient way… Any suggestions? I have tried searching Silva etc to just get the curated ones already but I am worried that these sequences are not specific enough to the strains used and produced by BEI… Any thoughts on this?
Many thanks in advance!

Topic		Replies	Views
Reference file for seq.error Commands in mothur	1	1545	September 3, 2015
seq.error Commands in mothur	11	10498	May 13, 2013
mock community Commands in mothur	2	1639	July 25, 2015
how to make mock community Commands in mothur	2	2403	January 8, 2014
Referece file Mock community Theory behind mothur	2	1823	April 18, 2017

own mock library sequences fasta file

Related topics