Demultiplexing & undetermined reads (Kozich 2013)

swuyts · August 13, 2015, 9:21am

Dear all,

We’ve been implementing the Dual index sequencing approach on a MiSeq (2 x 250 bp) as described in your Kozich 2013-paper and it’s wokring pretty good for us. The only thing that we seem to have is a huge amount of undetermined reads. We get around 33% before sequence processing and still end up with 24% after preprocessing. So my first question is, has anyone else experienced this using this sequencing strategy?

Of course I think I would be able to demultiplex from scratch using the Index files and see if that gives any difference, but then I bumped on a more theoretical question. The MiSeq demultiplexing software has a standard setting of allowing 1 mismatch in the barcode. We could allow more mismatches (if that’s possible with the barcodes) to make sure more sequences are identified to a sample. But how trustworthy are sequences with mismatches in their barcode? I’d guess that the mismatches in the barcodes are less likely to be sequence errors, but more likely errors from library preparation (PCR-level). And since we can find mismatches in the barcode, I think these mismatches also exists in the actual gene sequence. And mismatches in a 250 bp region (V4) where you build OTUs on only 8 bp difference (97% OTU definition) seems completely wrong to me. Long story short, my second question: Is it a good idea to try and demultiplex using 1 or 2 mismatches in barcode, or should this best be kept at 0 since the sequences are not trustworthy?

Thanks

pschloss · August 13, 2015, 1:41pm

I’m not sure what you mean by “undetermined”. You mean the ones that aren’t assigned to a pair of indices? If that’s the case then these tend to be the PhiX sequencing control (what % PhiX are you loading?). Also, if you have a less than stellar run, a larger number of reads will go in there because the reads are bad. I would not go above their defaults for deconvoluting the sequences to groups.

Pat

swuyts · August 13, 2015, 2:18pm

Indeed, that’s what I mean with undetermined reads. I haven’t thought about the PhiX, we usually load 10% and that might indeed explain some of the sequences in there. But after preprocessing, including aligning with the SILVA database and removing chloroplasts, eukaryotes, …, we still have around 24% of preprocessed reads in that group. If it was PhiX wouldn’t that be removed in these preprocessing steps as it won’t map to SILVA?

I agree in not going above the defaults, but what’s your opinion about reducing it to absolutely no mismatches in the barcodes?

Thank you for the reply

pschloss · August 13, 2015, 5:56pm

I think going with no mismatches is probably overly stringent. The data might get scrapped because of a low quality run. When we have accidentally overloaded the DNA on the chip we see a similar problem.

Pat

swuyts · August 17, 2015, 1:26pm

Thank you Pat for your input!

Topic		Replies	Views
DEMULTIPLEXING MISEQ PAIRED READS Commands in mothur	28	22583	April 24, 2017
Make.contig problem Commands in mothur	3	477	December 10, 2021
Problem for demultiplexing Miseq Paired end reads using make.contigs with oligos file Commands in mothur	3	590	May 20, 2019
Demultiplexing MiSeq Data Commands in mothur	4	4106	February 4, 2015
Cannot demultiplex fastq files with barcodes in the header Commands in mothur	4	522	November 26, 2020

Demultiplexing & undetermined reads (Kozich 2013)

Related topics