trim.seqs bug or not?

When I analyzed pyro-reads of 16S rRNA genes, I found some weird results.
Two sets of commands listed below gave me different results, although I expected the same fasta files.
The former set gave me a fasta file which contains shorter than 200 bp although I cut the sequences at minlength of 200.
I didn’t use any oligo file because I already had sorted files.
Is it a bug or is there anything that I don’t know?
If there is a bug like this, it might cause severe problems for further analyses.

mothur “#trim.seqs(fasta=./data\primerA1.fna, qfile=./data\primerA1.qual, minlength=200, maxlength=370)”
mothur “#trim.seqs(fasta=./data\primerA1.trim.fasta, qfile=./data\primerA1.trim.qual, maxambig=0, maxhomop=8, qwindowaverage=35, qwindowsize=50)”

mothur “#trim.seqs(fasta=./data\primerA1.fna, qfile=./data\primerA1.qual, maxambig=0, maxhomop=8, qwindowaverage=35, qwindowsize=50)”
mothur “#trim.seqs(fasta=./data\primerA1.trim.fasta, qfile=./data\primerA1.trim.qual, minlength=200, maxlength=370)”

Not a bug :slight_smile: This difference is due to the order of how we do the different steps within trim.seqs. When a sequence comes in, here’s the ordering of what happens…

  1. Find and strip off the barcode
  2. Find and strip off the forward primer
  3. Find and strip off the reverse primer (we don’t recommend)
  4. Quality score-based trimming
  5. Check for length
  6. Check for homopolymer length
  7. Check for ambiguous bases

So if you tell mothur to do 5, 6, and 7 and then go back and do 4 you will get a different result from if you do 4 first. Think about a sequence that comes off the sequencer at 250 bases. It’s longer than 200 bp so you’d keep it by your first approach and then trim it to whatever based on the quality scores. If instead you take the second approach you describe you’d first trim that sequences to perhaps 170 bases and then chuck it because it’s shorter than 200 bp.

Hope this helps…
Pat