Bug in trim.seqs

Saxphile · November 11, 2009, 3:21am

I’m getting some odd behavior when using trim.seqs with the qfile arguement when it’s not actually needed. Again I think an example will be more clear:

First I trim the sequences using average Q score as a threshold. Mothur didn’t throw up any mistakes so I’m assuming that qual file looked okay.

mothur > trim.seqs(fasta=Data.fasta, qfile=Data.qual, qaverage=25)
mothur > system(mv Data.trim.fasta Data.Q25.trim.fasta)
mothur > summary.seqs(fasta=Data.Q25.trim.fasta)
Start End NBases Ambigs Polymer
Minimum: 1 30 30 0 1
2.5%-tile: 1 34 34 0 2
25%-tile: 1 94 94 0 3
Median: 1 97 97 0 3
75%-tile: 1 97 97 0 3
97.5%-tile: 1 97 97 1 4
Maximum: 1 447 447 14 31

of Seqs: 151527

I then try to trim the sequences using some additional criteria which don’t actually require the qual file, but I left the qfile argument in anyway.

mothur > trim.seqs(fasta=Data.Q25.trim.fasta, qfile=Data.qual, maxambig=0, minlength=88, maxlength=107)
mothur > system(mv Data.Q25.trim.trim.fasta Data.Q25.QC.trim.fasta)
mothur > summary.seqs(fasta=Data.Q25.QC.trim.fasta)
Start End NBases Ambigs Polymer
Minimum: 1 88 88 0 2
2.5%-tile: 1 91 91 0 2
25%-tile: 1 96 96 0 3
Median: 1 97 97 0 3
75%-tile: 1 97 97 0 3
97.5%-tile: 1 97 97 0 3
Maximum: 1 107 107 0 7

of Seqs: 117266

Now I try the same thing but take the qfile argument out.

mothur > trim.seqs(fasta=Data.Q25.trim.fasta, maxambig=0, minlength=88, maxlength=107)
mothur > system(mv Data.Q25.trim.trim.fasta Data.Q25.QC.trim.fasta)
mothur > summary.seqs(fasta=Data.Q25.QC.trim.fasta)
Start End NBases Ambigs Polymer
Minimum: 1 88 88 0 2
2.5%-tile: 1 91 91 0 2
25%-tile: 1 96 96 0 3
Median: 1 97 97 0 3
75%-tile: 1 97 97 0 3
97.5%-tile: 1 97 97 0 3
Maximum: 1 107 107 0 7

of Seqs: 120644

I randomly picked a few sequences in the fasta and qual files to check that their length actually match, and as far as I can see, they did. Is anyone else able to replicate this behavior?

pschloss · November 11, 2009, 1:18pm

If you open the qual file in a text editor you’ll see that you get as many qual scores per sequence as the same, untrimmed sequence is long. So when you give the trimmed file and the qual file, weird things are bound to happen.

Saxphile · November 11, 2009, 5:49pm

So does that mean any trimming related to quality (i.e., qaverage and qtrim) need to be done before other types of trimming. I’m not sure I understood you 100%. Thanks.

pschloss · November 11, 2009, 6:34pm

you can do everything together:

mothur > trim.seqs(fasta=Data.fasta, qfile=Data.qual, qaverage=25, maxambig=0, minlength=88, maxlength=107)

should yield…

Start End NBases Ambigs Polymer
Minimum: 1 88 88 0 2
2.5%-tile: 1 91 91 0 2
25%-tile: 1 96 96 0 3
Median: 1 97 97 0 3
75%-tile: 1 97 97 0 3
97.5%-tile: 1 97 97 0 3
Maximum: 1 107 107 0 7

of Seqs: 120644

Saxphile · November 11, 2009, 10:25pm

Oh I know that. I was trying to see how many sequences each of the criteria trimmed, so I did it in multiple steps. Since the current behavior deviates from what people may expect, perhaps it would be a good idea to document it?

rahul12 · December 27, 2012, 7:18am

yes indeed, these are pyrotags, at the 3’ end only. I don’t know how I got some forward and some backward. Perhaps that happened before I saw the data–someone else did the original QC.

Topic		Replies	Views
Bugs in trim.seqs mothur bugs	5	92482	November 10, 2009
problem with trim.seqs Theory behind mothur	9	10414	May 4, 2011
Issues with trim.seqs qtrim parameter Commands in mothur	2	3177	January 14, 2011
Trim seqs command issues Commands in mothur	6	4872	October 4, 2011
trim.seqs qual and fasta file seq name mismatch Commands in mothur	1	1109	January 19, 2016

Bug in trim.seqs

of Seqs: 151527

of Seqs: 117266

of Seqs: 120644

of Seqs: 120644

Related topics