make.contigs - an automatic trimoverlap?

I was using ‘make.contigs’ to combine the forward and reversed fastq files download from Illumina Miseq procudure. Two kinds of rawdata as 16S rRNA and 18S rRNA sequence data are both processed similarly:

mothur > make.contigs(file,processors=16)
mothur > screen.seqs(fasta,qfile,contigsreport,minoverlap=25,mismatches=5)
mothur > trim.seqs(fasta,qfile,qaverage=30,qwindowsize=50,qwindowaverage=20)
mothur > trim.seqs(fasta,oligos,maxambig=0,maxhomop=6)

Customarily, above conditional execution would actually result to downstream errors upon 16S or 18S. The key ‘trimoverlap’ parameter works. The default value is FALSE which is working when processing 16S data. Practically, the value should be FALSE when processing 16S and TRUE processing 18S, otherwise, downstream command ‘trim.seqs’ occurs error while split libraries.

In my opinion, I found the answers. The Illumina Miseq procedure provides a length of 250±bp from one end for paired-end sequencing. In fact, my final target sequence data hold about 340±bp for 16S v4-v5 and 140±bp for 18S. When doing ‘make.contigs’ for 16S, the download raw sequences are shorter than target sequences which means the forward or reversed fastq only cover part of virtual sequence. While set the ‘trimoverlap’ value as FALSE would preserve the primer and barcode info for available ‘trim.seqs’ command. However, when doing ‘make.contigs’ for 18S, the total target sequences are shorter than any ending sequences, which in other words, the overlap bps are meanful. Indeed, I was using two strategies facing on 16S and 18S.

However, I doubt whether I failed to make the best of ‘trim.seqs’ command.
Overall, given the title, I am wondering if there is an automatic judgement algorithm on ‘trimoverlap’, just by comparing the known target sequences and the contigs awaiting for combining.

I am actually amateur in molecular microbiology. Above suggest is based on personal experiment which would turn out to be incorrect. I’d like to be told. Forgive my opinionated idea.

Thanks a lot.

ps, Forgive my poor English.

Hi,

A few things…

  1. This series of commands is not appropriate for MiSeq data because the quality score data is meaningless…
mothur > make.contigs(file,processors=16)
mothur > screen.seqs(fasta,qfile,contigsreport,minoverlap=25,mismatches=5)
mothur > trim.seqs(fasta,qfile,qaverage=30,qwindowsize=50,qwindowaverage=20)
mothur > trim.seqs(fasta,oligos,maxambig=0,maxhomop=6)

You really should follow the MiSeq SOP steps (http://www.mothur.org/wiki/MiSeq_SOP). Also note that it is pretty critical that the reads fully overlap. For your V3V4 data you should read this: http://blog.mothur.org/2014/09/11/Why-such-a-large-distance-matrix%3F/

  1. Your 18S fragment is only 140 nt long:

The key ‘trimoverlap’ parameter works. The default value is FALSE which is working when processing 16S data. Practically, the value should be FALSE when processing 16S and TRUE processing 18S, otherwise, downstream command ‘trim.seqs’ occurs error while split libraries.

Actually, I’d argue that you used the wrong sequencing technology. Illumina admits that generating reads longer than the fragment will increase the error rate for all of the fragments in the run. So I suspect your 18S data are highly error prone. I’d suggest using a 2x150 kit and then using the trim overlap command in make.contigs. There the 2x15 will work because your reads won’t run over the ends of the fragments.

Hope this helps,
Pat