Hello,
I received some sequence data that has been demultiplexed and the adapters and barcodes removed by the core lab prior to our receipt. The sequencing was run on Illumina MiSeq using the V4 primers (515F/806R) described by Caporaso et al. However, what is confusing me is that there are a few bases (ranging form zero to at least 3) at the start of the F and R reads occurring prior to the primers. I do not have much experience with primer removal in this context and was hoping to ask for thoughts on…
- Is this normal/expected or would I expect to see the primers as the first 19 and 20 bases and this perhaps could reflect incomplete removal of the adapter/barcode/spacer during the demultiplexing and trimming?
- Can I simply provide the primer sequences to the trim.seqs command and it will also remove any leading bases? If so, would I need to run this on each file prior to the make.contigs? Or can I simply use something like fastx trimmer to just remove say the first 23 bases or would this introduce artificial variation since each would have a different length (maybe only a problem…if it is a problem…for clustering/splitting algorithms that do not include a ref base alignment step prior to clustering?)?
Any thoughts on this matter would be greatly appreciated! I have included the first few lines of an example fasta file where I have underlined the primer sequences for the F and R reads to show what I am seeing. I believe I got the R primers correct given the degenerate primers.
Foward reads for example file: 515F: GTGCCAGCAGCCGCGGTAA G[u]GTGCCAGCAGCCGCGGTAA[/u]TACGTAGGTGGCGAGCGTTGTCCGGATTTACTGTGCGTAAAGAGAGCGTAGGCGGACTTTTAAGTGTGTTGTGAAATACTCGGCCTCAACTTCAGTGCTGCATTTCAAACTGGAAGTCTAGAGTGCAGAGGAGGAGAGTGGAATTCCTCGTGTAGCGGTGAAATGCGTGGTGATTAGGAAGAACACCAGTGGCGAGGGCGATTCTCTGGCCTGTAACTGCCGCTGAGGCTC
GTGCCAGCCGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGATTTTTTTGTCGTAAAGGGAGGGCAGGCGGTGTCTTTAGGTTGAAGTGAAAGCACCCGGCCCAACCGGGAAGGCTCCATGGCAACTGGGAGGTTTGGGTGCCGAAGAGGGGGGGGGGATTCCATGTGTAGCGGTGAAATGCGTAGATGTATGGGGGAACACCAGTGGCGAAGGCGGCTCTCTGGTGTGGCACTGGAGCTGAGGCTCG
CGTGCCAGCCGCCGCGGTAATACGTAGGTGGCGAGCGTTGTCCGGATTTACTGGGCGTAAAGGGAGCGTAGGCGGGCTTTTAAGTGAGATGTGAAATACTCGGGCTCAACTTCAGTGCTGCATTGCAAACTGGAAGCTTAGGGTGCAGGAGAGGAGACTGGAATTCCTAGGTTAGCGGTGAAATGCGTAGTGATTAGGAAGAACACCAGTGGCGAAGGCGATTCTCTGCGCTGTAACTGCCGCTGGGGCTC
Reverse reads for example file 806R: GGACTACHVGGGTWTCTAAT TCT[u]GGATTTCGGGTGTATCTTAT[/u]CCTTNTTGCTCACCACGCTTTCGGTCCTCAGCGTCTGTTACAGACCAGAGAGCCGCCGTCGCCACTTGTGTTCTTCCTATTCTCTACGTCTTTCACCGCTCCACTAGGATTTCCATCCTCCTCTCCTGCACTCTAGTCTTCCGTTTTGACATGCATCGCTCCTGTTCAGCGCGGGTTTTCATCATCCTCCTTCAATGTCCGCCTCCGCCCTCTTTACCCTCATTAATC
TGCGGACTACGGGGTTTTCTAATCTTTNTTGCTCACCACTCTTTCGCGCCCCAGCGTCATTTAAAGACCAGAGACTCGCTTTCGCCACTGGTGTTCCTCCATATATTTACGCTTTTAACGGCTACACGAGGATTTCCACTCTCTTCTCCTGTACTTCATTCTACCGGTTTCCAAGGCCTCGCGCATGTGGAGCCCGAGGTTTTCACATCAGTCTTAAGAGACCTCCGTCGCTTTCTTTCCGCGCATTAATC
TCCGCAATACTCGTGTATCTTATCCTGNTCGCTCCCCACTCTTTCGTCCCTCAGCGTCAGTTCCAGCCCAGAGACTCGCCTTCGCCATGGGAGTTCTTCCTAATCTCTACGCATTTCACCGCTACACTGGGAATTCCACTCTCCTCTCCTGCACTCTAGTCTCCCTGTTTCACATGCACCGCTCGCGTTGAGCCCGTCTTTTTCACTTCTCTCTTCAAGCTCGCCCTTCGCCCTCTTAACCCCCATAAATC
TCAGGTCTACAGGGTTTTCATATCCTGNTTGCTCTCCACGCTTTCGACCCTAAGTGTCAGTTACAGCCCAGAGAGCCGCTTTCGTCACGGGTGTTCCTTCATCTATCTACGCATTTCACCGCTACACATGGATTTCCACTCCTCTCTTCTGCACTCAAGTCTCCCAGTTTCCAATGTCTCCCGCGTGTTGAGCCGGTGCCTTTCACCTCAGTCTCAAGTTACTGCCTGCGCCCTCTTCACGCACAAAAATT
ACTGGAATACCCGGGTATCATATCCTGNTTGCTCCCAACGCTTGCGATCCTCAGCGTCATTTACAGACCAGTGACCCGCTCTCGCCACTGGGGTTCCTCCATATATCTACGCATTTCACCGCTACACGTGGTATTCCACACTCCTCTTCTGTACTAAAGTCTCTCATTTTCCAAAGACTAGTCCCGGTTCAGCCGGGGTGTTTAACATCAGTCTCGAGAAACCCCCATCGTCTGCTTTGCGCACCTTCAAT