we have sequenced the v3-v4 region using the v2 kit 2x250 bp and the primers described by Klindworth et al. (2013). I could double-check after sequencing, and only the locus specific sequence was maintained in the generated fastq (F: CCTACGGGNGGCWGCAG and R: GGATTAGATACCCVHGTAGTC). The initial tails of the primers were automatically removed.
After make.contigs, the output of samples presented the following result:
I’m not entirely sure if I need to remove this locus specific sequence of the primers or we could not get a fully (or minimal) overlapping.
How can I make a most confident overlapping in this case? How can I solve this to obtain a better coverage for the next steps of screening and filtering?
You need to include an oligos file to remove the primer sequences from the sequence in make.contigs. You aren’t going to be able to improve the amount of overlap between the reads since they are what they are.
In a subsequent step you need to use screen.seqs remove any sequences with an ambiguous base (maxambig=0) and that is longer than expected (perhaps maxlength=490?). You are likely to have most of your sequences removed because of these requirements.
I’d encourage you to consult this blogpost on the topic of the effects of having minimal overlap between reads
thank you for your reply.
Yet, I’ve read your post about the large distance matrix. However, we only have 2x250 kit available and the library were prepared after amplifying the V3-V4 region, Unfortunately, I did not participate in this step for choosing the better alternatives.
Regarding the last run, we identified that the sequencing kit was out of date provided by Illumina, which surely resulted in the poor quality of sequencing. I could detect so many ambiguity and q score as a whole was very low.
Now, we will repeat it using another kit.