Reads not fully overlap


I was browsing through the forum with questions about singletons and saw a post about reads that do not fully overlap will result in higher error rate and extra rare OTUs.

"_Re: alpha diversity after remove low abundance
Postby pschloss » Mon Dec 07, 2015 8:17 pm

Another big source of inflated OTU numbers is that you are using the V4-V5 region with 250 PE sequencing. This means that your reads do not fully overlap and that you are going to get an error rate 10-fold higher than you would get with the V4 region. The result? A lot of extra OTUs. We go over this in the Kozich manuscript if you are looking for specifics.

I agree with the earlier posters about removing singletons and doubletons. Furthermore, if your samples don’t have the exact same number of read (they never do) then one sample that has 1000 reads and another with 10000 will be treated differently if you remove singletons and doubletons. Finally, FWIW, we see some chimeras showing up dozens of times - they are not random artifacts."_
What does reads not fully overlap mean? Does it mean the regions outside the paired-reads during sequencing? So a single-read Miseq will generate a lot more singletons?


Fully overlapping means that if your region is 250 nt then you need 2 250 nt reads that overlap so that every base has 2-fold coverage.