I hoped you can answer me this question, before I start processing my MiSeq 2x300 sequences (since I have long amplicons with as little overlap as 20-50 bp). I was wondering if you can define a minlenght of overlapping in the make.contigs command or if there is already an default set (for instance a minimum of 15 bases overlap)?
Or would you recommend to rather do this with a routine overlapping software (like FLASH or PANDASeq), before getting started with mothur?
Thanks a lot for your help!
So I’m really not the person to ask about this as we’ve shown (Kozich 2013 AEM) that what you’re trying to do is a bad idea since you get very little denoising when you have anything but complete overlap of the reads. You can use screen.seqs to specify things like the minimum overlap.
Hope this helps,
thanks for your answer, I am sure screen.seqs will do. And thanks for the Reference!
Of course I can see the advantage of having overlapping reads, but well, I don’t work with bacteria, so you do not always have the choice in terms of primer pairs; and 4-5-4 wasn’t an option either.
Also, Illumina is now officially recommending a 550bp amplicon for bacteria, and I guess many people will use it in the future (http://support.illumina.com/downloads/16s_metagenomic_sequencing_library_preparation.ilmn).
Understood. So the tradeoff then is that you (and the people that listen to Illumina) won’t be able to do an OTU-based analysis since there will be so many unique reads due solely to sequencing errors. That will leave database-dependent classification methods as the only available option.