Hello, I have some questions similar to this post from a few years back.
I’ve got my first V4 .fastq data from a 2x250 MiSeq run and it seems I need to remove the primers, as my reads were assembling to ~290 length instead of the expected ~250 length for the V4 region. I am following the MiSeq SOP with 28 .fastq files for 14 samples and used the updated 16S EMP V4 primers:
515F (Parada)–806R (Apprill), forward-barcoded:
To remove them, I made an oligos.txt file:
primer GTGYCAGCMGCCGCGGTAA GGACTACNVGGGTWTCTAAT
and ran make.contigs():
make.contigs(file=stability.files, oligos=oligos.txt, processors=4)
Before using the oligos file I had about 1.5 million unique sequences and could not get past either dist() & cluster() or cluster.split() steps in the MiSeq SOP, an issue similar to the blog post here. With the oligos file though I now get about ~30k unique sequences and can run dist() and cluster() without issue and the pipeline finishes.
This might be all ok, but I have a few questions I’d like to learn more about:
Do I need to account for degeneracy (the Ys, Ms, Ns etc in the primers- if this is the correct term) in these primers by listing all the possible permutations in my oligos file? (e.g this post)
After including the oligos file, my contigs are aligning to a different region against the Silva database (silva.nr_v132), which was unexpected. Before (with primers not removed) my 292 nt contigs aligned to start=11895, stop=25318, which matches the Mother MiSeq SOP. Now, using the oligos file and with 253 nt contigs, they aligned to start=13862, stop=23444, which is… different. I made a new custom alignment from silva.nr_v132 for this region with pcr.seq() now, but is this alignment shift normal?
make.contigs() now throws a warning: * [WARNING]: your oligos file does not contain any group names. mothur will not create a groupfile. *. Is this an issue if my stability.file has all the reads properly mapped to samples? Resulting data seem to be grouped fine, but want to make sure. From the forum, it looks like others have had to specify a line like “barcode none none sample1” etc in the oligos file for sequences that have had barcodes removed but still have primers.
More of a theory question perhaps, but let’s assume the primers are left on and the 292 length contigs aligned to start=11895, stop=25318. Why don’t the primers just overhang the shorter V4 alignment and get trimmed off during screen() or filter(), sending ~250 nt reads downstream? I do understand that removing primers is best practice, just curious.
Appreciate any thoughts.