Dual indexing for 16S microbiomics

Hi,
I wonder if you recommend using the dual indexing approach (Kozich etal 2013) for 16S microbiomics.
I was unsure if this method is recommended just for metagenomics.
In our lab we do 16S microbiomics of V4 region using MiSeq v3 kit. We have so far following the EMP protocol.
I am a bioinformatician, and I am attempting to recommend to my lab new ways of generating the dataset that would reduce the error rate.
At the recent ComMet conference, I was struck at how almost every data presented was dual indexing based. But it was also all metagenomics. So, could you advice me if we would need to and can implement dual indexing strategy for 16S microbiomics on human samples.

Thank you very much for your time.
Kind regards,
Brindha.

I wonder if you recommend using the dual indexing approach (Kozich etal 2013) for 16S microbiomics.

well it’s my paper so of course i do! :slight_smile: You can find an updated wet lab protocol at https://github.com/SchlossLab/MiSeq_WetLab_SOP and the most up to date bioinformatics pipeline at http://www.mothur.org/wiki/MiSeq_SOP

DO NOT USE THE V3 chemistry:

http://blog.mothur.org/2014/09/11/Why-such-a-large-distance-matrix%3F/

Also, remember that metagenomics refers to shotgun sequencing of bulk DNA without prior amplification. What we describe is “16S rRNA gene sequencing” (not metagenomics or “microbiomics”)

Thank you Dr.Schloss. Your answer hopefully reassures my lab to adopt dual indexing.
I had read your Wet-lab SOP and Kozich et.al 2013 paper, and shared with my lab. My manager had queried why dual indexing isn’t very prevalently used for 16S, if it’s more suitable than single indexing.

Error Rates
I have one more question, if I may.
I am very inspired by the drive for error rates as seen in your publications and forum comments.
I would like to implement such error rate measurement at part of my routine analyses.
Can you share with me the formula you use to obtain the error rate?

I imagine it to be ((number of input reads - number of remaining reads) / number of input reads) x 100
And I presume one would calculate it for each step of attrition in the process of getting high quality unique reads, and the number of input reads that would go in the formula is the number of input reads for each step. So, the total error rate will be the sum of all the individual error rates.
Am I correct?

Kind regards,
Brindha.

I would like to implement such error rate measurement at part of my routine analyses.
Can you share with me the formula you use to obtain the error rate?

First, you have to co-sequence a mock community where you know the true sequences of your fragments. Second, because we know the true sequences, we can predict all possible chimeras and remove any reads that are more similar to a chimera than an actual sequence; these reads are then removed from the analysis since a chimera is not a sequencing error. Third, we align the reads to the closest reference sequence. Finally, we count the total number of bases and the total number of mismatched bases across all reads, including duplicates. The ratio of mismatches to total bases is the error rate. Multiply by 100 to get the percent error rate.