Reverse compliment of barcodes with trim.seqs

Hi all,

First, very appreciative to anyone who can help out!

  1. I had to merge pair ends through PEAR, because my data is from Mr. DNA with a mix of Forward and Reverse sequences in the original fastq files. There’s a way to do this in Mothur by first running Make.contigs without an oligo file specified, but that returns with an error that sequence names appear in the forward, but not reverse fastq. I’ve read I can take care of this by using remove.seqs, but this requires me to first run unique. seqs. Since I want to remove sequences that are unique amongst both fastq files, I would need to tell this command to look in two files at once. Is this possible?

  2. Alternatively, using the merged data with PEAR, I have gotten to trim.seqs in order to de-multiplex, but this step leaves me with only a few hundred reads per sample (Am I understanding the number given next to the sample ID in the output correctly?) and the file containing these sequences is 3.0 MB while scrap is 2.0 GB. I know the reads have very high quality. I have discovered that because reads exist in forward and reverse orientation, I can find the reverse compliments of my barcodes. If I can incorporate these reverse reads, I should double my reads per sample, but that still leaves the vast majority of reads in the scrap. Any thoughts? Here’s the header of the scrap file:

M01522_77_000000000-AR1WK_1_1101_16596_1576|b fbdiffs=1000(noMatch), rbdiffs=1000(noMatch) fpdiffs=0(match), rpdiffs=0(match)
AATAACGACCTGCATGAAACATAGAACGATTCGATAGCTGTCTTGCACACACATCTTGGTAAAATATTCATCAGTACATAGAATATGCTCGTCAGCTAATACCACTATAAAGTACAAAACGACCCTCTTAATCTTTATAGGTATAAAATCTAGTACACATCCAGGCGTTTTTATACAAGGTCATGGCCCCAAGAAGCGAGACGATGACGATAAAGAGCCAGAAGATGCAGCGTGGTAAACACAAGACTGTTTAATTGGGGGGATAAGCTTATCTAATCCACATAAAGTGAACAAAGGGATGAAAAGGAGACGAAGTAAAAAGGGGAAAGAAAGAAGTAAAGAAGAATGGAAAAAGAAGAAAAAGAAAGTAGAAAAAAGAAAACGGAAAGAGGAAAAAAAAAAAAAAAAAAAAAAAAGGAAGGGGAAGGAAAAAAGGCGGAAAG

M01522_77_000000000-AR1WK_1_1101_20602_1604|bf fbdiffs=1000(noMatch), rbdiffs=1000(noMatch) fpdiffs=1000(noMatch), rpdiffs=1000(noMatch)
AGCATTTAAATACTGGGAAGGACTTCTTGGGCCATTAAGCGAAACATGTATCTTTGCACTTATTAGTTTGTCTCCATTGGATGGCTTTGTTCACCTTAGGTGGATTAGATAAGCTTATCGCCCCAATTAAACAGTCTTGTGTTTACCACGCTGCATCTTCTGGCTCTTTATCGTCATCGTCTCGCTTCTTGGGGCCATCACCTTGTATACAAACGCCTGGATGTGTACTAGATTTTATCCCTATAAAGCTTCAGAGGGTCTTTTTGTCCTTTATTGTGGTCTTAGGTCACCAGAATATTCGGTGTAATGATGAATATTGTAAAATGATGTGTGAGAAAGAAAGATAGCGAATAGGTCTATGTATCATGAAGGTAGTTATG
M01522_77_000000000-AR1WK_1_1101_14624_1609|bf fbdiffs=1000(noMatch), rbdiffs=1000(noMatch) fpdiffs=1000(noMatch), rpdiffs=1000(noMatch)
TCCGTAGAGTAGCTTGGTGTTGGTAGATTAATTTTAAGCTCATTAAGATAGCATTTAAATACTTGGAAGGACTTCTTGTGCCCTTAAGCGAAACATGTATCTTTGCACTTATTAGTTGGGCTCCATTGGGTGGCTTTGTTCACCTTAGTTTGATTAGATAAGCTTATCTCCCCAATTAAACAGTCTTGTGTTTACCACGCTGCATCTTCTGGCTCTTTATCGTCATCGTCTCGCTTCTTGGGGCCATCACCTTGTATACAAACGCCTGGATGTGTACTAGATTTTATCCCTATAAAGCTTAAGAGGGTCTGATTGGAAGTTAATGTGGTGTTAGCTAACAAGCAGATTATATGTAAAGAAGAATATTTTAAAATGAGGTGAGAGAAAGAAAGCTATCGAATCGGAATAAGTAACATGCAGGTAGTTATG
M01522_77_000000000-AR1WK_1_1101_16028_1611|b fbdiffs=1000(noMatch), rbdiffs=1000(noMatch) fpdiffs=0(match), rpdiffs=0(match)
AATAACGACCTGCATGAAACATAGAACGATTCGATAGCTGTCTTGCACACACATCATGGTTAAATATTCATCAGTACTAAGAATATGCTGGTGTGCTAAGACCTCAATCAAGGACAAAAAGACCCTCTTATGCTTTATCGGGCTAAAATCTAGTACACATCCAGTCGTTTTTATACAAGGTGATGGCCCCAAGAAGCGAGACGATGACGATAAAGAGCCAGAAGATGCAGCGTGGTAAACACAAGACTGTTTAATTGGGGAGATAAGCTTATCTAATCCACCTAAGGTGAACAAAGAAATCAAATGGAGAGAAACAAATAAGAGAAAAGATAAAGGTGTAGAGTAATGGAGAAAGAAGTGATTCAAAGAATTGAAATGCTATCAAAAAGAGATGAAAAGGAAAAAAAAAAGAACAAGCGAAACGACGGATAAAAGGCAAAGACTG
M01522_77_000000000-AR1WK_1_1101_19243_1611|f fbdiffs=0(match), rbdiffs=0(match) fpdiffs=1000(noMatch), rpdiffs=1000(noMatch)
GAGGAGTAGCCTGTTATCCGTAGAGTAGCTTGGTGTTGGTAGATTAATTTTAAGCTCATTAAGATATCATTTAAATACTGGGAAGGACTTCTTTGGCCATTAAGCGAAACATGTATCTTTGCACTTATTAGTTGGTCTCCATTGGATGGCTTTGTTCACCTTAGGTGGATTAGATAAGCTTATCGCCCCAATTAAACAGTCTTGTGTTTACCACGCTGCATCTTCTGGCTGTTTATCGTCATCGTCTCGCTTCTTGGGGCCATCACCTTGTATACAAACGCCTGGATGTGTAGTAGATTTTAGAGCTATAAAGATTCAGAGGGTATTTTGGTGATATATAGTGGTATGAGATAAACAGCAGATTCTGTGTACTGATGAATAATGTAAAATGATGGGTGTGAAAGAGAGATATAGAATCGTTCTAG

Thanks so much!!
Dan

  1. I had to merge pair ends through PEAR, because my data is from Mr. DNA with a mix of Forward and Reverse sequences in the original fastq files. There’s a way to do this in Mothur by first running Make.contigs without an oligo file specified, but that returns with an error that sequence names appear in the forward, but not reverse fastq. I’ve read I can take care of this by using remove.seqs, but this requires me to first run unique. seqs. Since I want to remove sequences that are unique amongst both fastq files, I would need to tell this command to look in two files at once. Is this possible?

What version of mothur are you running? The make.contigs command should be “smart” enough to skip missing reads in your file. You can make sure the files have the same sequences by doing the following:

mothur > list.seqs(fastq=forwardFastqFile) - list sequences in forward file
mothur > get.seqs(fastq=reverseFastqFile) - select forward sequences from reverse file (may not select all if some are not present)
mothur > list.seqs(fastq=current) - list sequences that were in forward and also in reverse
mothur > get.seqs(fastq=forwardFastqFile) - select sequences that were in forward and also in reverse

  1. Alternatively, using the merged data with PEAR, I have gotten to trim.seqs in order to de-multiplex, but this step leaves me with only a few hundred reads per sample (Am I understanding the number given next to the sample ID in the output correctly?) and the file containing these sequences is 3.0 MB while scrap is 2.0 GB. I know the reads have very high quality. I have discovered that because reads exist in forward and reverse orientation, I can find the reverse compliments of my barcodes. If I can incorporate these reverse reads, I should double my reads per sample, but that still leaves the vast majority of reads in the scrap. Any thoughts?

You are correct about the output. The sampleID is followed by the number of sequences mothur identified with that barcode. Could you post your oligos file?

Hi,

Thanks so much for all your help.

The version of mothur is:1.37.6
I realized that I simply had to gunzip the files prior to analyses and then they ran smoothly. Unfortunately, I still ended up with only a few hundred reads per sample (exact same numbers as with PEAR data)

and as requested, here’s the oligos file, oligos.txt

primer GCCTGTTATCCGTAGAGTAGC cp23SR
primer AATAACGACCTGCATGAAAC cp23SF
barcode GAGGACTG NONE C4
barcode GAGGAGTA NONE C6
barcode GAGGCCAC NONE D3
barcode GAGGCGGA NONE D4
barcode GAGGCTTA NONE E3
barcode GAGGTTGA NONE E4
barcode GAGTATCT NONE F1
barcode GAGTCACA NONE F3
barcode GAGTCCTT NONE G3
barcode GAGTCTGT NONE G5
barcode GAGTGTTG NONE H2
barcode GAGTTAAT NONE H6
barcode GAGTTGCC NONE I3
barcode GAGTTGCT NONE I4
barcode GATACATA NONE J6
barcode GATACATT NONE J7
barcode GATAGCCA NONE K1
barcode GATATCTT NONE K2
barcode GATATGTC NONE L5
barcode GATATGTT NONE L6

Thanks again,
Dan

Also note that the file is tab delimited… pasting it in here seems to have removed that.

Dan

Can you try this?

primer GCCTGTTATCCGTAGAGTAGC AATAACGACCTGCATGAAAC cp23SR
primer AATAACGACCTGCATGAAAC GCCTGTTATCCGTAGAGTAGC cp23SF
barcode GAGGACTG NONE C4
barcode GAGGAGTA NONE C6
barcode GAGGCCAC NONE D3
barcode GAGGCGGA NONE D4
barcode GAGGCTTA NONE E3
barcode GAGGTTGA NONE E4
barcode GAGTATCT NONE F1
barcode GAGTCACA NONE F3
barcode GAGTCCTT NONE G3
barcode GAGTCTGT NONE G5
barcode GAGTGTTG NONE H2
barcode GAGTTAAT NONE H6
barcode GAGTTGCC NONE I3
barcode GAGTTGCT NONE I4
barcode GATACATA NONE J6
barcode GATACATT NONE J7
barcode GATAGCCA NONE K1
barcode GATATCTT NONE K2
barcode GATATGTC NONE L5
barcode GATATGTT NONE L6

mothur > make.contigs(file=yourFileFile, oligos=oligosFile, checkorient=T, pdiffs=2, bdiffs=1)

Note: This option is available in version 1.38.0 or later.