make.contigs creates contigs that are too long

Hello!

I have some 18S amplicon data, and that size of the amplicon is roughly 500 bp, using MiSEQ to sequence the amplicon. When running make.contigs I am consistently getting a contig size of ~500 bp when running summary.seqs (following make contigs). Reviewing the MiSEQ SOP i know that I should be suspicious of such a result. I am trying to trouble shoot why I could be getting contigs of this size. I’ve think it’s either one of the two scenarios :

  1. This is a legitimate problem and something is wrong with my data causing it not assemble at all.
  2. The small amplicon size suggests that the paired end reads would overlap so this contig is not necessarily incorrect; it’s something that happens when the insert size between paired ends is 0 or less.

Any suggestions?

Actually - what would be very useful is to understand dual-indexing using is the MiSeq Tutorial.

The paired-ends don’t perfectly overlap with each other, hence why by sequence sizes are roughly 500bp. This is because a “dual-indexing” approach to sequence wasn’t used (at least it doesn’t seem like it). So I do have non-overlapping paired end data where the insert size between the paired-end reads is variable. So - given such data I mothur still a reliable tool?

If you don’t have overlapping reads I would question what tool would be “reliable”. I’d suggest just processing each read on its own and using a phylotyping approach. Your error rates are going to be so high that I wouldn’t bother doing OTUs.

Pat

Thanks - I will do that and as a sanity check verify whether the two paired ends get the same results … or at least hopefully similar.

Hi Pat,
I have a similar case: My data/sequences come from a 250 bases MiSeq run and as my region is about 500 bases long, there should be no overlap.

f you don’t have overlapping reads I would question what tool would be “reliable”. I’d suggest just processing each read on its own and using a phylotyping approach. Your error rates are going to be so high that I wouldn’t bother doing OTUs.Pat

However according to sequencing company, they are able to tell which beginning/end sequences belong together by the way the sequencing library is prepared. Would it be okay to use mothur in this case?

Hi Karin,

We can definitely know what goes with what. The problem is that I doubt your amplicons are exactly 500 bp and so it’s unclear how many N’s you’d have to shove in between to make a config. Even if you did such a thing it would be hard to know what that means. Another problem is that unless the reads fully overlap the error rates of the individual reads and in your case the config would have a very high error rate.

pat