mothur

Primers (fpdiffs=0(match), rpdiffs=0(match) ) still found in the merged reads after make.contigs

Continuing the discussion from How does primer barcode checking on make.contigs work:

Thank you Pat. Yes, I did have such a case where primers are still found in merged reads after make.contigs. Here is the example:

R1.fastq
@M00347:15:000000000-JNGG2:1:2108:12900:22557 1:N:0:TCTACGACAT+TATGAGTGAT
GCGTCAGGACAGTGCCCGGTATTGGCGACAAGCGACAGCCGCTATGGCCGGACGGCCTATTTGGCAGCATCAGCCACTGTGCGACAACGGCGCTGGCCGTCATATCCCGACAGCGTATCGGCATTGATATAGAAAAAATCATGAGTCAGCACACGGCGACAGAGCTGGCGCCGTCCATTATTGATAGCGATGAGCGCCAAATTCTCCAGGCGAGCTTGCTCTGTCTCTTATACACATCTCCGAGCCCACGA
+
CCCCDDEFFEFEGGGGGGGGGGHHHHGGGGGHHGGGGGHHGGGGGHHHHGGGGGGGGGHHHHHHHHGGHHHHHHHHHHHHHHGGGGGHGGGGGGGGGHGGGGHHHHHHGGGGGHGGGGHGGGGGHHGHHHHHHHHHHGGGGGGGGGGGGGGGGGGGGGFFFFFFFFFFFFFFFCCFFFFFFFFFFHHHFFFFFFFFFFFFFFFHHHFFFFFFFFFFFFFFFHFFFFFFFFFFFFFFFFFFFFFFFFFFFF=

R2.fastq
@M00347:15:000000000-JNGG2:1:2108:12900:22557 2:N:0:TCTACGACAT+TATGAGTGAT
AGCAAGCTCGCCTGGAGAATTTGGCGCTCATCGCTATCAATAATGGACGGCGCCAGCTCTGTCGCCGTGTGCTGACTCATGATTTTTTCTATATCAATGCCGATACGCTGTCGGGATATGACGGCCAGCGCCGTTGTCGCACAGTGGCTGATGCTGCCAAATAGGCCGTCCGGCCATAGCGGCTGTCGCTTGTCGCCAATACCGGGCACTGTCCTGACGCCTGTCTCTTATACACATCTGACGCTGCCGAC
+
CCCDDFCFFCDDGGGGGGGGGGHHHGGGGGHHHGHGGHHHHHHHHHHHGGGGGGGGGHHHHHHGHGGGGHFHHHHHHHGHHHHHHHHGHHHHHHHHHHHHHGGGGGGGGGGHGGGGGHHHHHGGGGGGGGGGGGGGGGGHGGGHHHFGHEFGGFGGGGGGGGGGFGFDGGGGGGFDBBBFFFADFFF.99EEAFA?DFFFBBF<@DCAFFFBFFFF:BA?B.<BFFFBFFFBFBBBB0:BFB.9A.AF>--

I1.fastq
@M00347:15:000000000-JNGG2:1:2108:12900:22557 1:N:0:TCTACGACAT+TATGAGTGAT
TCTACGACAT
+
EEEEEEEEEE

I2.fastq
@M00347:15:000000000-JNGG2:1:2108:12900:22557 2:N:0:TCTACGACAT+TATGAGTGAT
TATGAGTGAT
+
CCCCCFFFFF

batch_file
R1.fastq        R2.fastq        I1.fastq        I2.fastq

test.oligos
primer  CTGTTGCGGAGTATCGTCAGA   CAGCGTTAACACCAGTTGCTC   OG0002548primerGroup4
primer  GCGTCAGGRCRGTGCCCGGTA   AGCRAGCTCGCCTGGAGAATT   OG0003025primerGroup8
barcode TCTACGACAT      TATGAGTGAT      2013K_0463

And we run make.contigs(file=batch_file, oligos=test.oligos, insert=25, trimoverlap=f, allfiles=0, pdiffs=1), it generates merged fasta file:

>M00347_15_000000000-JNGG2_1_2108_12900_22557   ee=0.350836     fbdiffs=0(match), rbdiffs=0(match) fpdiffs=0(match), rpdiffs=0(match)
GTCGGCAGCGTCAGATGTGTATAAGAGACAGGCGTCAGGACAGTGCCCGGTATTGGCGACAAGCGACAGCCGCTATGGCCGGACGGCCTATTTGGCAGCATCAGCCACTGTGCGACAACGGCGCTGGCCGTCATATCCCGACAGCGTATCGGCATTGATATAGAAAAAATCATGAGTCAGCACACGGCGACAGAGCTGGCGCCGTCCATTATTGATAGCGATGAGCGCCAAATTCTCCAGGCGAGCTTGCTCTGTCTCTTATACACATCTCCGAGCCCACGA

But the primers can still be found in the merged reads:

primer  GCGTCAGGRCRGTGCCCGGTA   AGCRAGCTCGCCTGGAGAATT   OG0003025primerGroup8

The matched primers are:

GCGTCAGGACAGTGCCCGGTA
AATTCTCCAGGCGAGCTTGCT

And these primers can also be found at the very end of R1.fastq and R2.fastq, as indicated by fpdiffs=0(match), rpdiffs=0(match) ? But somehow they were not trimmed by Mothur ?

And after trimming off those 2 primers (with cutadapt), the reads looks like this:

>M00347_15_000000000-JNGG2_1_2108_12900_22557   ee=0.350836     fbdiffs=0(match), rbdiffs=0(match) fpdiffs=0(match), rpdiffs=0(match)
TTGGCGACAAGCGACAGCCGCTATGGCCGGACGGCCTATTTGGCAGCATCAGCCACTGTGCGACAACGGCGCTGGCCGTCATATCCCGACAGCGTATCGGCATTGATATAGAAAAAATCATGAGTCAGCACACGGCGACAGAGCTGGCGCCGTCCATTATTGATAGCGATGAGCGCCA

Wondering if I missed something here ?

Best…

Lets take a closer look at what mothur is doing:

R1.fastq

GCGTCAGGACAGTGCCCGGTATTGGCGACAAGCGACAGCCGCTATGGCCGGACGGCCTATTTGGCAGCATCAGCCACTGTGCGACAACGGCGCTGGCCGTCATATCCCGACAGCGTATCGGCATTGATATAGAAAAAATCATGAGTCAGCACACGGCGACAGAGCTGGCGCCGTCCATTATTGATAGCGATGAGCGCCAAATTCTCCAGGCGAGCTTGCTCTGTCTCTTATACACATCTCCGAGCCCACGA

R2.fastq

AGCAAGCTCGCCTGGAGAATTTGGCGCTCATCGCTATCAATAATGGACGGCGCCAGCTCTGTCGCCGTGTGCTGACTCATGATTTTTTCTATATCAATGCCGATACGCTGTCGGGATATGACGGCCAGCGCCGTTGTCGCACAGTGGCTGATGCTGCCAAATAGGCCGTCCGGCCATAGCGGCTGTCGCTTGTCGCCAATACCGGGCACTGTCCTGACGCCTGTCTCTTATACACATCTGACGCTGCCGAC

After primer trimming:

R1.fastq

TTGGCGACAAGCGACAGCCGCTATGGCCGGACGGCCTATTTGGCAGCATCAGCCACTGTGCGACAACGGCGCTGGCCGTCATATCCCGACAGCGTATCGGCATTGATATAGAAAAAATCATGAGTCAGCACACGGCGACAGAGCTGGCGCCGTCCATTATTGATAGCGATGAGCGCCAAATTCTCCAGGCGAGCTTGCTCTGTCTCTTATACACATCTCCGAGCCCACGA

R2.fastq

TGGCGCTCATCGCTATCAATAATGGACGGCGCCAGCTCTGTCGCCGTGTGCTGACTCATGATTTTTTCTATATCAATGCCGATACGCTGTCGGGATATGACGGCCAGCGCCGTTGTCGCACAGTGGCTGATGCTGCCAAATAGGCCGTCCGGCCATAGCGGCTGTCGCTTGTCGCCAATACCGGGCACTGTCCTGACGCCTGTCTCTTATACACATCTGACGCTGCCGAC

mothur reverses R2 and aligns the fragments to each other:

R2 reversed = GTCGGCAGCGTCAGATGTGTATAAGAGACAGGCGTCAGGACAGTGCCCGGTATTGGCGACAAGCGACAGCCGCTATGGCCGGACGGCCTATTTGGCAGCATCAGCCACTGTGCGACAACGGCGCTGGCCGTCATATCCCGACAGCGTATCGGCATTGATATAGAAAAAATCATGAGTCAGCACACGGCGACAGAGCTGGCGCCGTCCATTATTGATAGCGATGAGCGCCA

R1.aligned:

…TTGGCGACAAGCGACAGCCGCTATGGCCGGACGGCCTATTTGGCAGCATCAGCCACTGTGCGACAACGGCGCTGGCCGTCATATCCCGACAGCGTATCGGCATTGATATAGAAAAAATCATGAGTCAGCACACGGCGACAGAGCTGGCGCCGTCCATTATTGATAGCGATGAGCGCCAAATTCTCCAGGCGAGCTTGCTCTGTCTCTTATACACATCTCCGAGCCCACGA

R2.aligned:

GTCGGCAGCGTCAGATGTGTATAAGAGACAGGCGTCAGGACAGTGCCCGGTATTGGCGACAAGCGACAGCCGCTATGGCCGGACGGCCTATTTGGCAGCATCAGCCACTGTGCGACAACGGCGCTGGCCGTCATATCCCGACAGCGTATCGGCATTGATATAGAAAAAATCATGAGTCAGCACACGGCGACAGAGCTGGCGCCGTCCATTATTGATAGCGATGAGCGCCA----------------------------------------------------

Because R2 starts at 0 and the overlap begins at 52 mothur grabs the first 52 chars of R2 as the beginning of the contig.

contig start = GTCGGCAGCGTCAGATGTGTATAAGAGACAGGCGTCAGGACAGTGCCCGGTA

The overlap is from 52 to 230:

TTGGCGACAAGCGACAGCCGCTATGGCCGGACGGCCTATTTGGCAGCATCAGCCACTGTGCGACAACGGCGCTGGCCGTCATATCCCGACAGCGTATCGGCATTGATATAGAAAAAATCATGAGTCAGCACACGGCGACAGAGCTGGCGCCGTCCATTATTGATAGCGATGAGCGCCA

The contigs end is taken from R1 position 231 to 282:

AATTCTCCAGGCGAGCTTGCTCTGTCTCTTATACACATCTCCGAGCCCACGA

The full contig is:

GTCGGCAGCGTCAGATGTGTATAAGAGACAGGCGTCAGGACAGTGCCCGGTATTGGCGACAAGCGACAGCCGCTATGGCCGGACGGCCTATTTGGCAGCATCAGCCACTGTGCGACAACGGCGCTGGCCGTCATATCCCGACAGCGTATCGGCATTGATATAGAAAAAATCATGAGTCAGCACACGGCGACAGAGCTGGCGCCGTCCATTATTGATAGCGATGAGCGCCAAATTCTCCAGGCGAGCTTGCTCTGTCTCTTATACACATCTCCGAGCCCACGA

Thank you Sara ! It is clear that Mothur is doing what it is supposed to do !

Looks like we have primer read-through on the 5’ end, so the primer on the 3’ end was getting left in the merged read. We can run cutadapt on the merged read to remove those left-over primers, but this involves extra steps and is not ideal.

The other option we tried was to set trimoverlap = t. But our reads vary highly in length, so this is suboptimal. Wondering if there is other settings, commands we can use in this case ? Could qcr.seqs() potentially work ?

Thanks again !

Pcr.seqs is a great option to resolve issues like this.

mothur > make.contigs(file=batch_file, oligos=test.oligos, insert=25, trimoverlap=f, allfiles=0, pdiffs=1)
mothur > pcr.seqs(fasta=current, count=current, oligos=test2.oligos, pdiffs=1)

*.pcr.fasta:

 >M00347_15_000000000-JNGG2_1_2108_12900_22557	fpdiffs=0(match) rpdiffs=0(match) 		ee=0.350836	fbdiffs=0(match), rbdiffs=0(match) fpdiffs=0(match), rpdiffs=0(match) 
 TTGGCGACAAGCGACAGCCGCTATGGCCGGACGGCCTATTTGGCAGCATCAGCCACTGTGCGACAACGGCGCTGGCCGTCATATCCCGACAGCGTATCGGCATTGATATAGAAAAAATCATGAGTCAGCACACGGCGACAGAGCTGGCGCCGTCCATTATTGATAGCGATGAGCGCCA