filter.seqs bug?

kimitas · January 29, 2013, 5:44am

Hi there,
I am not sure It is a bug or me… :oops:
mothur > summary.seqs(fasta=16S_juvs_all.trim.rename.unique.align)

....so this what I had after the alignement (silva):

Start End NBases Ambigs Polymer NumSeqs
Minimum: 1044 1046 2 0 1 1
2.5%-tile: 1044 5711 246 0 4 1638
25%-tile: 1044 8411 370 0 5 16377
Median: 1044 9964 406 0 5 32753
75%-tile: 1044 11888 446 0 5 49129
97.5%-tile:1044 13862 493 0 6 63868
Maximum: 43112 43116 531 0 8 65505
Mean: 1178.13 10195.5 397.359 0 4.89766

of Seqs: 65505

And so this is what I did for screening s:

mothur > screen.seqs(fasta=16S_juvs_all.trim.rename.unique.align, name=16S_juvs_all.trim.rename.names, minlength=200, start=1044)
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1044 4710 200 0 3 1
2.5%-tile: 1044 6091 254 0 4 1605
25%-tile: 1044 8411 372 0 5 16043
Median: 1044 9964 406 0 5 32085
75%-tile: 1044 11888 446 0 5 48127
97.5%-tile: 1044 13862 493 0 6 62565
Maximum: 1044 15634 531 0 8 64169
Mean: 1044 10118.8 399.824 0 4.90717

of Seqs: 64169

…BUT then when I did the filter…I got shorter sequences!!:

mothur > filter.seqs(fasta=16S_juvs_all.trim.rename.unique.good.align, vertical=T, trump=.)

Start End NBases Ambigs Polymer NumSeqs Minimum: 1 650 130 0 3 1 2.5%-tile: 1 701 148 0 3 1605 25%-tile: 1 701 152 0 4 16043 Median: 1 701 161 0 5 32085 75%-tile: 1 701 168 0 5 48127 97.5%-tile: 1 701 187 0 6 62565 Maximum: 1 701 213 0 8 64169 Mean: 1 700.996 161.918 0 4.62301 # of Seqs: 64169

So then I just did without the trump and vertical and I got my length of seqeunces back…but obviously I have a longer alignement…How can I fix this??

mothur > filter.seqs(fasta=16S_juvs_all.trim.rename.unique.good.align)

Start End NBases Ambigs Polymer NumSeqs Minimum: 1 701 200 0 3 1 2.5%-tile: 1 993 254 0 4 1605 25%-tile: 1 1241 372 0 5 16043 Median: 1 1370 406 0 5 32085 75%-tile: 1 1516 446 0 5 48127 97.5%-tile: 1 1565 493 0 6 62565 Maximum: 1 1617 531 0 8 64169 Mean: 1 1360.34 399.824 0 4.90717 # of Seqs: 64169

Output File Name:
16S_juvs_all.trim.rename.unique.good.filter.summary

Thanks! Kim

pschloss · January 29, 2013, 1:56pm

Hi Kim,

The issue is that the V1 region is problematic because there tend to be lineages that have significant insertions/introns within the region (e.g. TM7 comes to mind). So if you calibrate everything by length, one 200 bp fragment may only go half way into the alignment while another will go much longer. Instead of setting minlength=200, can you try, end=5711? This way you know all of the sequences are spanning the same alignment length.

Pat

kimitas · January 30, 2013, 5:33am

Great! thanks

kimitas · January 30, 2013, 7:02am

…But still have another question…
Why end at 5711 and not at 13862 or 8411?..
For example I have in another set of sequences this:

Start End NBases Ambigs Polymer NumSeqs
Minimum: 0 0 0 0 1 1
2.5%-tile: 1044 1109 12 0 2 1442
25%-tile: 1044 8365 285 0 4 14417
Median: 1044 9916 399 0 5 28833
75%-tile: 1044 13130 470 0 5 43249
97.5%-tile:43061 43116 501 0 6 56223
Maximum: 43116 43117 535 0 8 57664
Mean: 7144.03 14110.3 336.496 0 4.58171

of Seqs: 57664

So in this case I did :

mothur > screen.seqs(fasta=16S_ads_all.trim.rename.unique.align, name=16S_ads_all.trim.rename.names, group=16S_ads_all.trim.rename.groups, start=1044, end=8365)

And after filtering I got:
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 1039 325 0 3 1
2.5%-tile: 1 1042 351 0 4 849
25%-tile: 1 1042 381 0 5 8486
Median: 1 1042 383 0 5 16971
75%-tile: 1 1042 395 0 5 25456
97.5%-tile:1 1042 398 0 6 33093
Maximum: 1 1042 463 0 8 33941
Mean: 1 1042 384.009 0 5.07274

of Seqs: 33941

Output File Name:
16S_ads_all.trim.rename.unique.good.filter.summary

So looks good…but I have trouble understanding which length of alignement is best…

thanks
kim

pschloss · January 30, 2013, 11:52am

Sure - that’s up to you. I suggested 5711 because that would allow you to use the most sequences for the length you seemed to be interested in. If you’re running these through trim.flows/shhh.flows then your reads will generally be in the 250-300 bp range.

Topic		Replies	Views
problems with filter.seqs Commands in mothur	3	2158	March 26, 2015
screen.seqs and filter.seqs Commands in mothur	1	2369	April 18, 2012
sequences not the same length-filter.seqs mothur bugs	2	2343	December 17, 2015
filter.seqs - potential bug? mothur bugs	7	8847	December 1, 2014
Filtered alignment, 0 mothur bugs	10	796	April 30, 2020

filter.seqs bug?

of Seqs: 65505

of Seqs: 64169

of Seqs: 57664

of Seqs: 33941

Related topics