Dear all,
We’ve been implementing the Dual index sequencing approach on a MiSeq (2 x 250 bp) as described in your Kozich 2013-paper and it’s wokring pretty good for us. The only thing that we seem to have is a huge amount of undetermined reads. We get around 33% before sequence processing and still end up with 24% after preprocessing. So my first question is, has anyone else experienced this using this sequencing strategy?
Of course I think I would be able to demultiplex from scratch using the Index files and see if that gives any difference, but then I bumped on a more theoretical question. The MiSeq demultiplexing software has a standard setting of allowing 1 mismatch in the barcode. We could allow more mismatches (if that’s possible with the barcodes) to make sure more sequences are identified to a sample. But how trustworthy are sequences with mismatches in their barcode? I’d guess that the mismatches in the barcodes are less likely to be sequence errors, but more likely errors from library preparation (PCR-level). And since we can find mismatches in the barcode, I think these mismatches also exists in the actual gene sequence. And mismatches in a 250 bp region (V4) where you build OTUs on only 8 bp difference (97% OTU definition) seems completely wrong to me. Long story short, my second question: Is it a good idea to try and demultiplex using 1 or 2 mismatches in barcode, or should this best be kept at 0 since the sequences are not trustworthy?
Thanks