barcodes and primers only partially removed during make.contigs

Hello,

I don’t know if it’s a bug of the program or a problem with my oligo file - I’ve checked it and cannot find anything wrong with it. I’m running mothur V39.5 on Ubuntu 14.03 LTS, and there’s enough free space on the hard disk. This is the sixt time I’m using mothur as the best pipeline ever for metabarcoding - and I can’t understand what’s going on. The sequences are MiSeq V3 paired ends and the length of the amplicons is 300-400. Both F and R primers were barcoded.
This is the command:
make.contigs(ffastq=Oom11F_R1.fastq, rfastq=Oom11F_R2.fastq, oligos=OligoOom.txt, pdiffs=1, bdiffs=0, processors=12, rename=T, checkorient=T)
This is the oligo file, tab-separated:
primer GCGGAAGGATCATTACCAC TCTTCATCGDTGTGCGAGC
barcode GCTTCTAG ATAGCTTG AEF001
barcode GCTTCTAG CCTTAATG AEF002
barcode GCTTCTAG CACATGCT AEF003
barcode GCTTCTAG TGTCATGC AEF004
barcode GCTTCTAG CGATAAGG AEF005
barcode GCTTCTAG TTACGCGA AEF006
barcode GCTTCTAG TGGAGCTT AEF007
barcode GCTTCTAG GATACTGC AEF008 etc. (150 samples + mock community)
After make contig (which works well, good overlap, 13 millions assembled sequences), in the fasta file all sequences have been renamed according to the sample name but not all have the barcodes and primers removed!
This one is OK, it starts after the F primer and ends before the reverse:

1_SEF045
ACCTAAAAACTTTCCACGTGAACTGTCGTTATTTGTTGTGCGCTCTCTGCGGTGTCGGTGGCGTCTGCTGGCTTTGTTGCTGGCGGGTGCGAGCCGGATGCGGAGGCTGAACGAAGGTCGAGTTGCTTTGCTCTCGGCTGACTTATTTTTCAAACCCAATACCAAACTTACTGATTATACTGTGAGAACGAAAGTTCTTGCTTTTAACTAGATAACAACTTTCAGCAGTGGATGTCTAG
The following still has both barcodes corresponding to HEF041- first and last 8 characters ( and the F and R primers):
1_HEF041
CATCTTGAGCGGAAGGATCATTACCACACCAAAAAACACCCCACGTGAATGTATTCTGTATGAGGCTTGTGCTGCTCTTAGGGGCGGCTAGCCGAAGGTTTCGCAAGAGACCGATGTATTTTTAATCCCTTTTATTAAATGACTGATCAAAAACTGCAGACAGAAATGTGTGCATTCAATTGAAATACAACTTTCAACAGTGGATGTCTAGGCTCGCACAACGATGAAGATACACAGT
If I run trim.seqs it doesn’t fix the problem, it discards all the sequences that were cut, of course, which are the majority. I end up with 5 millions sequences only. Just let me know if I should send the oligo file and part of the fatsq files… Any idea of what’s going on and any solution?
I will really appreciate your answer… I have 8 runs to analyze!
Thanks,
Anna Maria

Could you send your oligos file and a set of the fastq files that contain a problem read to mothur.bugs@gmail.com so I can track down the issue for you?

Let’s look at one read:

@M02442:176:000000000-BL946:1:1101:10035:1896 1:N:0:ACAGTG
GCTATTGCGCGGAAGGATCATTACCACACCAAAAAAACTTTCCACGTGAACCGTTGTAATTATGTTCTGTGCTTTCCTTCGGGAAGGCTGAACGAAGGTAAGCCGCTTTATTGTGGCTTGCCGACGTACTTTTCAAACCCATTTACTTAATACAGAACTATACTCCGAAAACGAAAGTCTTTGGTTTTAATCAATAACAACTTTCAGCAGTGGATGTCTAGGCTCGCACAACGATGAAGATCGCGTAA
+
CCCCCGGGGGGGGGGGFDGFGGGGGGGGGGG@EEFGGGGGGGGGGGGFFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCGGGGGGGGGGGCCFGGGGGGGGGGGGGGGGGGGGGGGGGG>FGGGGGGGGGGGGGFGEGGGGGGGGGGGGGF9DEGFGGGGFGDGGGEGGGGGGGGGGGGGGFGGFFGGGGFFGGGGGGGGGGFGGGGGG6?FFGGGGFGBFFF9GEFFDFAFC6B<>F@AB>>?

@M02442:176:000000000-BL946:1:1101:10035:1896 2:N:0:ACAGTG
TTACGCGATCTTCATCGTTGTGCGAGCCTAGACATCCACTGCTGAAAGTTGTTATTGATTAAAACCAAAGACTTTCGTTTTCGGAGTATAGTTCTGTATTAAGTAAATGGGTTTGAAAAGTACGTCGGCAAGCCACAATAAAGCGGCTTACCTTCGTTCAGCCTTCCCGAAGGAAAGCACAGAACATAATTACAACGGTTCACGTGGAAAGTTTTTTTGGTGTGGTAATGATCCTTCCGCGCAATAGC
+
CCCCCGGGGGGGGFGGGGGGGGGGGGGGGCDDGGFGFGGGGGGGGGDFGFGFFGFFGGGGGGG<FFG<FGGGFFEGGGG?FGGCGGGGGGGGGGGGGGGGGFGFGGGGGGGGGGGF<FGGGGGEGGGGGGGFGGGGGF,EEGGGGGGGGEDEAFFGGGGGF9FGGGGGGD@CGFFFGGFGFGCGGCFFGGFGD>FGGFFFFAF>0@C?FGBAFFFFF@@>A=21@@?CCECEFF4?E;<?>B0>@EFF

primer GCGGAAGGATCATTACCAC TCTTCATCGDTGTGCGAGC
barcode GCTATTGC TTACGCGA HEF005

@M02442:176:000000000-BL946:1:1101:10035:1896 1:N:0:ACAGTG
GCTATTGCGCGGAAGGATCATTACCACACCAAAAAAACTTTCCACGTGAACCGTTGTAATTATGTTCTGTGC…

@M02442:176:000000000-BL946:1:1101:10035:1896 2:N:0:ACAGTG
TTACGCGATCTTCATCGTTGTGCGAGCCTAGACATCCACTGCTGAAAGTTGTTATTGATTAAAACCAAAG…

R2 is flipped, and the two fragments are aligned, resulting in:

@M02442:176:000000000-BL946:1:1101:10035:1896 1:N:0:ACAGTG
...........................ACCAAAAAAACTTTCCACGTGAACCGTTGTAATTATGTTCTGTGCTTTCCTTCGGGAAGGCTGAACGAAGGTAAGCCGCTTTATTGTGGCTTGCCGACGTACTTTTCAAACCCATTTACTTAATACAGAACTATACTCCGAAAACGAAAGTCTTTGGTTTTAATCAATAACAACTTTCAGCAGTGGATGTCTAGGCTCGCACAACGATGAAGATCGCGTAA

@M02442:176:000000000-BL946:1:1101:10035:1896 2:N:0:ACAGTG
GCTATTGCGCGGAAGGATCATTACCACACCAAAAAAACTTTCCACGTGAACCGTTGTAATTATGTTCTGTGCTTTCCTTCGGGAAGGCTGAACGAAGGTAAGCCGCTTTATTGTGGCTTGCCGACGTACTTTTCAAACCCATTTACTTAATACAGAACTATACTCCGAAAACGAAAGTCTTTGGTTTTAATCAATAACAACTTTCAGCAGTGGATGTCTAG---------------------------

The reads are then assembled, resulting in:

GCTATTGCGCGGAAGGATCATTACCACACCAAAAAAACTTTCCACGTGAACCGTTGTAATTATGTTCTGTGCTTTCCTTCGGGAAGGCTGAACGAAGGTAAGCCGCTTTATTGTGGCTTGCCGACGTACTTTTCAAACCCATTTACTTAATACAGAACTATACTCCGAAAACGAAAGTCTTTGGTTTTAATCAATAACAACTTTCAGCAGTGGATGTCTAGGCTCGCACAACGATGAAGATCGCGTAA

I can see barcodes and primers on the assembled read, but I think it’s related to your dataset. Pat, what are your thoughts?

That looks right to me - do these barcodes look familiar, Anna Maria?

It appears that I have to clarify my request: the barcodes that you have identified are perfectly right, they correspond to the file I gave. Mothur assembles the reads, recognizes the primers and the barcodes and rename the sequences according to the oligo file.

The problem is that after this step, the barcodes and the primers should be removed from the sequences.
This happens ONLY FOR SOME OF THE SEQUENCES - apparently, at least 5 millions do not have the barcodes and the primers removed (checked by running trim.seqs).
This is what I can’t explain.

Do you get also this weird mixture when using make.contigs with the files I sent you (R1 and R2 fastq and oligo file)?
Thank you for your attention,
Anna Maria