How does make.contigs merging work?

jess · February 8, 2021, 3:21pm

While checking some Mothur output, I’ve come across two places where I don’t understand what Mothur is doing. The first is in make.contigs, the second is in pcr.seqs (that question is in a separate post).

When I look at a random sample of raw reads, I see the exact primer match at the start of R1 (forward) and R2 (reverse primer), as expected. I also see an exact match for the reverse complement of the reverse primer in R1, and an imperfect match for the reverse complement of the forward primer in R2. The imperfect forward primer has one mismatch.

I see that the imperfect forward primer appears in the output of make.contigs. So, why does Mothur, when merging, “capture” the imperfect forward primer when a perfect one exists?

I found this to be the case for about 50% of my random sample of reads. It leads me to believe I don’t quite understand how make.contigs works. Here are the parameters I run:

make.contigs(file=Undetermined.paired.files, processors=40, format=illumina1.8+, oligos=amr.oligos, bdiffs=0, pdiffs=0, checkorient=t, insert=25, trimoverlap=f, allfiles=0)

pschloss · February 8, 2021, 6:44pm

Can you post a forward and reverse sequence along with how they’re getting assembled and what you think is wrong with the output?

Pat

jess · February 8, 2021, 6:56pm

Thanks, Pat. Here’s the sequence info (this is the same example read I’ve used for my other post, linked above, on pcr.seqs). I bolded the primers, and I’ve un-bolded the mismatch on the forward primer in R2 (and the make.contigs output).

R1 sequence:

TTGAGATGGTGACAAAGAGAGTGCAACGGATGATGTTCGCGGCGGCGGCGTGCATTCCGCTGCTGCTGGGCAGCGCGCCGCTTTATGCGCAGACGATTGCGGTGCAGCAAAAGCTGGCGGCGCTGGAGAAAAGCAGCGGAGGGCGGCTGGGCGTCGCGCTCATCGATACCGCAGATAATACGCAGGTGCTTTATCGCGGTGATGAACGCTTTCCAATGTGCAGTACCAGTACTGTCTCTTATACACATCTC

R2 sequence

TACTGGTACTGCACATTGGAAAGCGTTCATCACCGCGATAAAGCACCTGCGTATTATCTGCGGTATCGATGAGCGCGACGCCCAGCCGCCCTCCGCTGCTTTTCTCCAGCGCCGCCAGCTTTTGCTGCACCGCACTCGTCTGCGCATAAAGCGGCGCGCTGCCCAGCAGCAGCGGAATGCACGCCGCCGCCGCGAACATCATCCGTTGCACTCTCTTTGTCACCATCCCAACTGTCTCTTACACACATCTT

merged sequence, after make.contigs

AAGATGTGTGTAAGAGACAGTTGGGATGGTGACAAAGAGAGTGCAACGGATGATGTTCGCGGCGGCGGCGTGCATTCCGCTGCTGCTGGGCAGCGCGCCGCTTTATGCGCAGACGAGTGCGGTGCAGCAAAAGCTGGCGGCGCTGGAGAAAAGCAGCGGAGGGCGGCTGGGCGTCGCGCTCATCGATACCGCAGATAATACGCAGGTGCTTTATCGCGGTGATGAACGCTTTCCAATGTGCAGTACCAGTACTGTCTCTTATACACATCTC

forward primer: TTGAGATGGTGACAAAGAGAGTG
reverse complement of the fwd primer: CACTCTCTTTGTCACCATCTCAA

reverse primer: TACTGGTACTGCACATTGGAAAG
rev complement of the reverse primer: CTTTCCAATGTGCAGTACCAGTA

jess · February 9, 2021, 12:54am

Some other relevant info I should have mentioned in the initial post:

1). I’m using Mothur v.1.44.3
2). This is not 16S data. This is highly-multiplexed amplicon sequencing data sequenced on Illumina MiSeq. Instead of 1 primer, we have ~800 primers. It’s designed for near-complete overlap on R1 and R2 (to minimize errors).

Do let me know if you have more questions or you need to see additional data. I have an Excel spreadsheet with all of this info for a random sample of 30 reads. Thanks again for your help!

westcott · February 9, 2021, 8:44pm

The pcr.seqs command will scan the read from left to right and look for the first possible match of the forward primer. Then it will scan the read from right to left looking for the first possible match of the reverse primer. If both are found the read is trimmed, otherwise it’s scrapped.

The make.contigs command is more strict about where primers can be located in the read. The make.contigs command will look for the barcodes and primers ONLY at the ends of the read. If the primers are found, they are removed and the fragments are assembled. Mothur found the exact match of the primers and removed them. It then assembled the remaining read fragments. The imperfect matches were not considered because they are not located at the ends of the read.

AndreasDoetsch · February 15, 2021, 7:11am

Have you tried using trimoverlap=T ?
This should remove surplus sequence from both ends and probably also removes both your perfect and imperfect primer sequences. For the further analyses, you’ll want to remove primers anyway because they are not informative and might even cause misclassification (e.g., if you have ambiguous bases).

jess · February 16, 2021, 7:06pm

Thanks for this suggestions, Andreas! We’re trying it out today. It seems like an elegant solution to our problem.

AndreasDoetsch · February 17, 2021, 6:39am

Be careful though, the behaviour of trimoverlap can be tricky, because it removes any overlap. That means 3’-overlap when you sequence inserts shorter than your readlength, which is good and removes primers and adapter sequences. But also 5’-overlaps that happen when your insert is longer than the readlength (e.g. with V3-V4 and 2x 250 bp), which will remove parts of your insert.

This can cause issues if your lab uses several regions and you just copy-paste your workflow.

jess · February 19, 2021, 7:21pm

Thanks for the heads-up, Andreas. We just finished a trial run with trimoverlap=t and I looked at that data compared to when trimoverlap=f. For my random sample of 30 reads, trimoverlap=t option does exactly what we expect (and what we want). Our design specifies complete read-through, and our inserts are always shorter than our read length, so it makes sense that it is working.

I really appreciate your suggestion. I will continue to scrutinize the data to make sure it’s working as expected and keep an eye out for edge cases, but so far it’s looking great.

system · March 1, 2021, 7:22pm

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How does pcr.seqs trimming work? Theory behind mothur	6	997	February 26, 2021
combining forward and reverse primer seq Commands in mothur	2	3611	October 25, 2010
make.contigs - recognizing primer Commands in mothur	4	3443	March 22, 2014
Make.contigs question Commands in mothur	2	2197	April 3, 2013
make contig Commands in mothur	1	2122	July 19, 2013

How does make.contigs merging work?

Related topics