Three small issues with make.contigs() on MiSeq data

Hi,
I’ve been having two issues with make.contigs() applied on a MiSeq dataset.

  1. Primers don’t get trimmed. In fact the command seems to ignore them altogether (i.e. I get the same number of seqs with or without the primers in the .oligos file). Barcodes, on the other hand do get trimmed.
    My command looks like this:
make.contigs(ffastq=NG-6690-2_S1_L001_R1_001.fastq, rfastq=NG-6690-2_S1_L001_R2_001.fastq, oligos=Ladakh.elevation.oligos, bdiffs=2, pdiffs=3, processors=4)

and the .oligos file is:

forward NTACGGGWGGCWGCA
forward NNTACGGGWGGCWGCA
forward NNNTACGGGWGGCWGCA
reverse NNNNNGGGTMTCTAATCCBKTT
reverse NNNNNNGGGTMTCTAATCCBKTT
reverse NNNNNNNGGGTMTCTAATCCBKTT
reverse NNNNNNNNGGGTMTCTAATCCBKTT
barcode TGGTGTTG H1.D.1
barcode AGGAGAAG H1.D.2
barcode CACAGTGT H1.D.3
barcode GTACACGT H1.D.4

  1. Out of ~13 Mio raw sequences in the MiSeq run I get about 3.5 Mio after running make.contigs(). Is this normal? running grep on the scrap file finds 7 Mio more hits of the primers without any mismatch. Should I maybe relax quality filtering thresholds?

  2. The function doesn’t produce anything like a name file. How do I later identify which sequence belongs to which sample, now that the barcodes are trimmed?

I’m using mothur 1.31 64 bit on linux

Thanks in advance
Roey

The make.contigs command is expecting the oligos file to look like:

primer CCTACGGGAGGCAGCAG ATTACCGCGGCTGCTGG V3
primer ATTAGAWACCCBDGTAGTCC CCCGTCAATTCMTTTRAGT V5
primer ACTYAAAKGAATTGACGGG ACRACACGAGCTGACGAC V6
BARCODE ccaac cactg F01R2A
BARCODE ccaac aacca F01R2B
BARCODE ccaac tgtca F01R2C
BARCODE ccaac aaacc F01R2D

The names are optional, but the primers and barcodes should be paired.

Thanks Sarah!

I modified my oligos file as you suggested. Now make.contigs() trims the primers but still nearly all my sequences end up in scrap.
Among them I can clearly detect sequences with both primers and barcode present without a single mismatch.
Here’s an example:

  1. My oligos file looks like this:

primer NTACGGGWGGCWGCA NNNNNGGGTMTCTAATCCBKTT f1-r1
primer NTACGGGWGGCWGCA NNNNNNGGGTMTCTAATCCBKTT f1-r2
primer NTACGGGWGGCWGCA NNNNNNNGGGTMTCTAATCCBKTT f1-r3
primer NTACGGGWGGCWGCA NNNNNNNNGGGTMTCTAATCCBKTT f1-r4
primer NNTACGGGWGGCWGCA NNNNNGGGTMTCTAATCCBKTT f2-r1
primer NNTACGGGWGGCWGCA NNNNNNGGGTMTCTAATCCBKTT f2-r2
primer NNTACGGGWGGCWGCA NNNNNNNGGGTMTCTAATCCBKTT f2-r3
primer NNTACGGGWGGCWGCA NNNNNNNNGGGTMTCTAATCCBKTT f2-r4
primer NNNTACGGGWGGCWGCA NNNNNGGGTMTCTAATCCBKTT f3-r1
primer NNNTACGGGWGGCWGCA NNNNNNGGGTMTCTAATCCBKTT f3-r2
primer NNNTACGGGWGGCWGCA NNNNNNNGGGTMTCTAATCCBKTT f3-r3
primer NNNTACGGGWGGCWGCA NNNNNNNNGGGTMTCTAATCCBKTT f3-r4
barcode TGGTGTTG H1.D.1
barcode AGGAGAAG H1.D.2
barcode TGGTTGGT H1.D.3

This is one of the sequences in the scrap:

M01822_7_000000000-A4EE9_1_1101_15974_1473 | bf
TGGTTGGTATACGGGAGGCAGCAGTCGGGAATTTTGGGCAATGGGGGAAACCCTGACCCAGCAACGCCGCGTGAAGGATGAAGTATTTCGGTATGTAAACTTCGAAAGAATAGGAAGAATTAATGACGGTACTATTTATAAGGTCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGACCAAGCGTTGTTCGGATTTACTGGGCGTAAAGGGCGCGTAGGCGGCGCGGTAAGTCACTTGTGAAATCTCTGAGCTTAACTCAGAACGGCCAAGTGATACTGCAGTGCTAGAGTGTGGAAGGGGCAATCGGAATTCTTGGTGTAGCGGTGAAATGCGTAGATATCAAGAGGAACACCTGAGGTGAAGACGGGTTGCTGGGCCAACACTGACGCTGAGGCGCGAAAGCCAGGGGAGCAAACCGGATTAGAGACCCTTTTA

And here are the two amplicons from the fastaq files:
@M01822:7:000000000-A4EE9:1:1101:15974:1473 1:N:0:1
TGGTTGGTATACGGGAGGCAGCAGTCGGGAATTTTGGGCAATGGGGGAAACCCTGACCCAGCAACGCCGCGTGAAGGATGAAGTATTTCGGTATGTAAACTTCGAAAGAATAGGAAGAATTAATGACGGTACTATTTATAAGGTCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGACCAAGCGTTGTTCGGATTTACTGGGCGTAAAGGGCGCGTAGGCGGCGCGGTAAGTCACTTGTGAA
+
BBBBBBBAFFFFGCGAECEFGCGGFGFGGGCHHHHF?EHHGHHHHGGGGCGEFHGHHHFHHHFHFCFEEGEEEGGHHFEGHHHGGHHHHEFFHAGHHFHDDFFFEFGGHHHHHHHFGHGHHFHFFHHGGGHGHFHHGFHFHHHBGBDCCCFFHFHHGCCHEHGAHHFHGGGG;;.0FBEBFFDGFGFFAFEDA.@AEFFFFFFFFFFFFEFFBFDAFFFFFFFFCF.B99-=AFF.:AFBBBFFFFBFBB

@M01822:7:000000000-A4EE9:1:1101:15974:1473 2:N:0:1
TAAAAGGGTCTCTAATCCGGTTTGCTCCCCTGGCTTTCGCGCCTCAGCGTCAGTGTTGGCCCAGCAACCCGTCTTCACCTCAGGTGTTCCTCTTGATATCTACGCATTTCACCGCTACACCAAGAATTCCGATTGCCCCTTCCACACTCTAGCACTGCAGTATCACTTGGCCGTTCTGAGTTAAGCTCAGAGATTTCACAAGTGACTTACCGCGCCGCCTACGCGCCCTTTACGCCCAGTAAATCCGAAC
+
AAAA3DAA4CCFGGGGGGGG22EEHHGGGCAFAGHCGHDAA?AEFCFFEEEEEGFFGEAFFH1GFBGGHG?EDAGFFHHHHHFGGFHGFGGHHF?FGED4DDHECEDFBFGHBEGGGCHHHHGHHGGH>GB?AACBGFEGHFHHFHGHCGHHHGHHFFFHHDFDFHHFF0CFCC.<EFBCGGHGHHHHHHHCGFFHHEGHCFBFGFFFFBBD-?BG??B?ABD=—@BFFFFF.-;9.9/;/9BBDD.@

The sequence has barcode 3 with primer pair 1 in it, without any mismatch, why did it end up in scraps?

Thanks again

Mothur is expecting paired barcodes as well. For “barcode TGGTGTTG H1.D.1”, mothur is reading H1.D.1 as the reverse sequences fragments barcode. When it can’t find it the sequence fails. If you want to see more detail about what mothur is reading, try setting the debug flag.

set.dir(debug=t)
make.contigs(…)

Thanks!
So if I don’t have reverse barcode should I just use the first character of the primer instead?

Also good to know about the debug=t tip

Roey

Mothur is expecting a barcode for each read.

barcode TGGTGTTG reverseBarcode H1.D.1

The TGGTGTTG would get removed from the reads in ffastq, and reverseBarcode from the reads in rfastq.

OK I’ve reformatted my oligos file and tried running it again.
Still, nearly all seqs end up in scrap.

For example, the following is a part of the oligo file (I added a primer name because without it the groups file didn’t form):

primer NTACGGGWGGCWGCA GGGTMTCTAATCCBKTT p
barcode CTGACTGA NNNNN H3.D.1


This is a contig from fastq-F:

@M01822:7:000000000-A4EE9:1:1101:16324:1454 1:N:0:1
CTGACTGAATACGGGTGGCTGCAGTGGGGAATATTGGACAATGGGCGCAAGCCTGATCCAGCAATGCCGCGTGGGTGAAGAAGGTCTTCGGATTGTAAAGCCCTT
TCGACGGGGACGATGATGACGGTACCCGTAGAAGAAGCCCCGGCTAACTTCGTGCCAGCAGCCGCGGTAATACGAAGGGGGCTAGCGTTGCTCGGAATTACTGGGCGT
AAAGGGCGCGTAGGCGGCTCGCTTAGTCAGGCGTGAA[/size]
+
BABBBFFFFFFFGGGGGGGGGGHHHHHHGGGGHHHHHFHGHGHHHHGGGGEHGHGHHHHHHHHHHGHGGGGGGGGEGHHGHHHHHHHHHHGEGGHHHHHFHH
HHHHHGGGGGGGGGGGEGHGGHGFGGHGHFGEFFGHHGHFHHHGGGGGGHGGGGGFGGGGFEGFGGGGFFA?ABFFEE?DFFFFDFFFF=DDFBFFDFFFFFBBFFFFFF
FFFFFFFFCACFA-A:@FFFDFAEFFF0000BFD9:AE

And this one is from fastq-R

@M01822:7:000000000-A4EE9:1:1101:16324:1454 2:N:0:1
TGTCGAGGGTATCTAATCCGGTTTGCTCCCCACGCTGTCGCGCCTCAGCGTCAGTAACGGACCAGCTCGCCGCCTTCGCCACCGGTGTTCTTCCCAATA
TCTACGAATTTCACCTCTACACTGGGAATTCCGCGAGCCTCTTCCGTCCTCTAGCCCACCCGTCTCAAGCGCAGTCCCCAGGTTGAGCCCAGGAATTTCACGCCTGACTTA
GCGAGCCGCCTACGCGCCCTTTACGCCCAGTAATTCCGAGC
+
BBAAABBBBBFFGGGGGGGFGGGGGHHHHGGGGGGAEGHDEGCFGGFHHGGGGCGHHHGGGGGEAGHFDG?EGGGGHHGG?EHDFFGFFHDHHHHHHHFFHHHFFDEC
GHFFBGHHBDGHHBFGFFHHHHHHG?DGCFHHBGFHGHHFGHFHHGHHGEDECGGFFFGBBGG?ADFBCEBDFFFFFFBFFF.E;BFFBBFF?FFE.BBF/:/;-@9@BFAF…B.9–ADBFFFB?DD@.9/:B/:BFFFB-

Even though both primers and barcodes appear in it without a mismatch it ends up in the ‘scrap’:

M01822_7_000000000-A4EE9_1_1101_16324_1454 | f
ATACGGGTGGCTGCAGTGGGGAATATTGGACAATGGGCGCAAGCCTGATCCAGCAATGCCGCGTGGGTGAAGAAGGTCTTCGGATTGTAAA
GCCCTTTCGACGGGGACGATGATGACGGTACCCGTAGAAGAAGCCCCGGCTAACTTCGTGCCAGCAGCCGCGGTAATACGAAGGGGGCTAGCGTTGCTCGGAATTACT
GGGCGTAAAGGGCGCGTAGGCGGCTCGCTTAGTCAGGCGTGAAATTCCTGGGCTCAACCTGGGGACTGCGCTTGAGACGGGTGGGCTAGAGGACGGAAGAGGCTCG
CGGAATTCCCAGTGTAGAGGTGAAATTCGTAGATATTGGGAAGAACACCGGTGGCGAAGGCGGCGAGCTGGTCCGTTACTGACGCTGAGGCGCGACAGCGTGGGGAG
CAAACCGGATTAGATACCCTCGACA

mothur only trimmed the F barcode but not the primers and the R barcode.
Why?

Thanks for bringing this bug to our attention. It only effects primers or barcodes for the reverse fragment that are all N’s. I have fixed it and uploaded version 1.31.2 to the wiki.