Many bacteria have multiple copies of 16S rRNA gene, and they each might be slightly different in sequence. In databases that mothur typically uses (greengeens and silva), do they have some/most/all copies of a 16S gene mapping back to the same species?
Also, if my sequenced V4 region of one of the copies of 16S from Species A is not represented in the databases, is it possible that mothur mistakes that for species B? What are the chances of that happening?
If you’re doing OTUs at a 3% distance level or if you are phylotyping the data, then I wouldn’t worry about intra-genomic variation. I think this is a big problem for the people pushing ASVs, which would have you put different variants from the same genome into different bins
Thanks, that makes sense. Can we say with confidence that any single species will have 16S copies with sequences no more than 3% different? What if the copy sequenced is not near the centroid? Assume if two copies are maximally different, they could be close to 6% dissimilar.
As an aside, if those different ASV bins map back to the same species, then that’s not really a problem is it? Of course this is assuming all variants are in the database and that the sequences are accurate.
I’d be reluctant to try and make a claim about distance and taxonomic groupings. It is very rare for a genome to have multiple copies of the 16S rRNA gene that are more than 3% different from each other. For our opticlust OTUs, the sequences within an OTU would not be more than 3% different from each other - they wouldn’t be 6% different from each other.
ASVs are a problem if they split copies from the same genome into different bins because it is skewing the relative abundances. When combined with potential PCR bias, it would be possible to make an inference about one copy that is different from the other copies wihtin that genome.