Overlapping V3-V4 using v2 2x250 bp

Dear all,
we have sequenced the v3-v4 region using the v2 kit 2x250 bp and the primers described by Klindworth et al. (2013). I could double-check after sequencing, and only the locus specific sequence was maintained in the generated fastq (F: CCTACGGGNGGCWGCAG and R: GGATTAGATACCCVHGTAGTC). The initial tails of the primers were automatically removed.
After make.contigs, the output of samples presented the following result:

     Start   End     NBases  Ambigs  Polymer NumSeqs
Minimum:        1       245     245     0       3       1
2.5%-tile:      1       400     400     0       5       24478
25%-tile:       1       440     440     4       7       244777
Median:         1       462     462     8       8       489553
75%-tile:       1       480     480     11      10      734329
97.5%-tile:     1       501     501     20      16      954628
Maximum:        1       502     502     63      217     979105
Mean:   1       460     460     8       9
# of unique seqs:       979105
total # of seqs:        979105

I’m not entirely sure if I need to remove this locus specific sequence of the primers or we could not get a fully (or minimal) overlapping.
How can I make a most confident overlapping in this case? How can I solve this to obtain a better coverage for the next steps of screening and filtering?

Thank you so much in advance!!

I’ve just checked the entire sequence of the primers were included in the resulted file ‘trim.contigs.fasta’ after make.contigs.
Then, how can I solve that?

thank you

Hi Allan,

You need to include an oligos file to remove the primer sequences from the sequence in make.contigs. You aren’t going to be able to improve the amount of overlap between the reads since they are what they are.

In a subsequent step you need to use screen.seqs remove any sequences with an ambiguous base (maxambig=0) and that is longer than expected (perhaps maxlength=490?). You are likely to have most of your sequences removed because of these requirements.

I’d encourage you to consult this blogpost on the topic of the effects of having minimal overlap between reads


Hi Pat,
thank you for your reply.
Yet, I’ve read your post about the large distance matrix. However, we only have 2x250 kit available and the library were prepared after amplifying the V3-V4 region, Unfortunately, I did not participate in this step for choosing the better alternatives.

Regarding the last run, we identified that the sequencing kit was out of date provided by Illumina, which surely resulted in the poor quality of sequencing. I could detect so many ambiguity and q score as a whole was very low.
Now, we will repeat it using another kit.


1 Like