make.contigs creates contigs that are too long

amcrisan · December 23, 2013, 7:28pm

Hello!

I have some 18S amplicon data, and that size of the amplicon is roughly 500 bp, using MiSEQ to sequence the amplicon. When running make.contigs I am consistently getting a contig size of ~500 bp when running summary.seqs (following make contigs). Reviewing the MiSEQ SOP i know that I should be suspicious of such a result. I am trying to trouble shoot why I could be getting contigs of this size. I’ve think it’s either one of the two scenarios :

This is a legitimate problem and something is wrong with my data causing it not assemble at all.
The small amplicon size suggests that the paired end reads would overlap so this contig is not necessarily incorrect; it’s something that happens when the insert size between paired ends is 0 or less.

Any suggestions?

amcrisan · January 3, 2014, 12:02am

Actually - what would be very useful is to understand dual-indexing using is the MiSeq Tutorial.

The paired-ends don’t perfectly overlap with each other, hence why by sequence sizes are roughly 500bp. This is because a “dual-indexing” approach to sequence wasn’t used (at least it doesn’t seem like it). So I do have non-overlapping paired end data where the insert size between the paired-end reads is variable. So - given such data I mothur still a reliable tool?

pschloss · January 6, 2014, 8:34pm

If you don’t have overlapping reads I would question what tool would be “reliable”. I’d suggest just processing each read on its own and using a phylotyping approach. Your error rates are going to be so high that I wouldn’t bother doing OTUs.

Pat

amcrisan · January 8, 2014, 8:35pm

Thanks - I will do that and as a sanity check verify whether the two paired ends get the same results … or at least hopefully similar.

Karin · January 15, 2014, 9:47am

Hi Pat,
I have a similar case: My data/sequences come from a 250 bases MiSeq run and as my region is about 500 bases long, there should be no overlap.

f you don’t have overlapping reads I would question what tool would be “reliable”. I’d suggest just processing each read on its own and using a phylotyping approach. Your error rates are going to be so high that I wouldn’t bother doing OTUs.Pat

However according to sequencing company, they are able to tell which beginning/end sequences belong together by the way the sequencing library is prepared. Would it be okay to use mothur in this case?

pschloss · January 15, 2014, 8:48pm

Hi Karin,

We can definitely know what goes with what. The problem is that I doubt your amplicons are exactly 500 bp and so it’s unclear how many N’s you’d have to shove in between to make a config. Even if you did such a thing it would be hard to know what that means. Another problem is that unless the reads fully overlap the error rates of the individual reads and in your case the config would have a very high error rate.

pat

Topic		Replies	Views
Sequence length Theory behind mothur	26	9150	May 2, 2017
Help with contigs Commands in mothur	10	390	August 21, 2023
some puzzles of the command "make contigs" Commands in mothur	11	6865	July 11, 2014
make.contigs - an automatic trimoverlap? Commands in mothur	1	1892	February 4, 2016
General question about make.contigs Commands in mothur	2	671	March 1, 2019

make.contigs creates contigs that are too long

Related topics