Continuing the discussion from How does primer barcode checking on make.contigs work:
Thank you Pat. Yes, I did have such a case where primers are still found in merged reads after make.contigs. Here is the example:
R1.fastq
@M00347:15:000000000-JNGG2:1:2108:12900:22557 1:N:0:TCTACGACAT+TATGAGTGAT
GCGTCAGGACAGTGCCCGGTATTGGCGACAAGCGACAGCCGCTATGGCCGGACGGCCTATTTGGCAGCATCAGCCACTGTGCGACAACGGCGCTGGCCGTCATATCCCGACAGCGTATCGGCATTGATATAGAAAAAATCATGAGTCAGCACACGGCGACAGAGCTGGCGCCGTCCATTATTGATAGCGATGAGCGCCAAATTCTCCAGGCGAGCTTGCTCTGTCTCTTATACACATCTCCGAGCCCACGA
+
CCCCDDEFFEFEGGGGGGGGGGHHHHGGGGGHHGGGGGHHGGGGGHHHHGGGGGGGGGHHHHHHHHGGHHHHHHHHHHHHHHGGGGGHGGGGGGGGGHGGGGHHHHHHGGGGGHGGGGHGGGGGHHGHHHHHHHHHHGGGGGGGGGGGGGGGGGGGGGFFFFFFFFFFFFFFFCCFFFFFFFFFFHHHFFFFFFFFFFFFFFFHHHFFFFFFFFFFFFFFFHFFFFFFFFFFFFFFFFFFFFFFFFFFFF=
R2.fastq
@M00347:15:000000000-JNGG2:1:2108:12900:22557 2:N:0:TCTACGACAT+TATGAGTGAT
AGCAAGCTCGCCTGGAGAATTTGGCGCTCATCGCTATCAATAATGGACGGCGCCAGCTCTGTCGCCGTGTGCTGACTCATGATTTTTTCTATATCAATGCCGATACGCTGTCGGGATATGACGGCCAGCGCCGTTGTCGCACAGTGGCTGATGCTGCCAAATAGGCCGTCCGGCCATAGCGGCTGTCGCTTGTCGCCAATACCGGGCACTGTCCTGACGCCTGTCTCTTATACACATCTGACGCTGCCGAC
+
CCCDDFCFFCDDGGGGGGGGGGHHHGGGGGHHHGHGGHHHHHHHHHHHGGGGGGGGGHHHHHHGHGGGGHFHHHHHHHGHHHHHHHHGHHHHHHHHHHHHHGGGGGGGGGGHGGGGGHHHHHGGGGGGGGGGGGGGGGGHGGGHHHFGHEFGGFGGGGGGGGGGFGFDGGGGGGFDBBBFFFADFFF.99EEAFA?DFFFBBF<@DCAFFFBFFFF:BA?B.<BFFFBFFFBFBBBB0:BFB.9A.AF>--
I1.fastq
@M00347:15:000000000-JNGG2:1:2108:12900:22557 1:N:0:TCTACGACAT+TATGAGTGAT
TCTACGACAT
+
EEEEEEEEEE
I2.fastq
@M00347:15:000000000-JNGG2:1:2108:12900:22557 2:N:0:TCTACGACAT+TATGAGTGAT
TATGAGTGAT
+
CCCCCFFFFF
batch_file
R1.fastq R2.fastq I1.fastq I2.fastq
test.oligos
primer CTGTTGCGGAGTATCGTCAGA CAGCGTTAACACCAGTTGCTC OG0002548primerGroup4
primer GCGTCAGGRCRGTGCCCGGTA AGCRAGCTCGCCTGGAGAATT OG0003025primerGroup8
barcode TCTACGACAT TATGAGTGAT 2013K_0463
And we run make.contigs(file=batch_file, oligos=test.oligos, insert=25, trimoverlap=f, allfiles=0, pdiffs=1), it generates merged fasta file:
>M00347_15_000000000-JNGG2_1_2108_12900_22557 ee=0.350836 fbdiffs=0(match), rbdiffs=0(match) fpdiffs=0(match), rpdiffs=0(match)
GTCGGCAGCGTCAGATGTGTATAAGAGACAGGCGTCAGGACAGTGCCCGGTATTGGCGACAAGCGACAGCCGCTATGGCCGGACGGCCTATTTGGCAGCATCAGCCACTGTGCGACAACGGCGCTGGCCGTCATATCCCGACAGCGTATCGGCATTGATATAGAAAAAATCATGAGTCAGCACACGGCGACAGAGCTGGCGCCGTCCATTATTGATAGCGATGAGCGCCAAATTCTCCAGGCGAGCTTGCTCTGTCTCTTATACACATCTCCGAGCCCACGA
But the primers can still be found in the merged reads:
primer GCGTCAGGRCRGTGCCCGGTA AGCRAGCTCGCCTGGAGAATT OG0003025primerGroup8
The matched primers are:
GCGTCAGGACAGTGCCCGGTA
AATTCTCCAGGCGAGCTTGCT
And these primers can also be found at the very end of R1.fastq and R2.fastq, as indicated by fpdiffs=0(match), rpdiffs=0(match) ? But somehow they were not trimmed by Mothur ?
And after trimming off those 2 primers (with cutadapt), the reads looks like this:
>M00347_15_000000000-JNGG2_1_2108_12900_22557 ee=0.350836 fbdiffs=0(match), rbdiffs=0(match) fpdiffs=0(match), rpdiffs=0(match)
TTGGCGACAAGCGACAGCCGCTATGGCCGGACGGCCTATTTGGCAGCATCAGCCACTGTGCGACAACGGCGCTGGCCGTCATATCCCGACAGCGTATCGGCATTGATATAGAAAAAATCATGAGTCAGCACACGGCGACAGAGCTGGCGCCGTCCATTATTGATAGCGATGAGCGCCA
Wondering if I missed something here ?
Best…