screen.seqs hugh loss of seqs?

Hi,

I run screen.seqs(fasta=HX1JDSX01.shhh.trim.align, name=HX1JDSX01.shhh.trim.names, group=HX1JDSX01.shhh.groups, optimize=start, end=6409, criteria=95, processors=20)

My concern is why Im loosing a lot of sequences in this step. I mean most of the seqs seem to cover the same area of the 16S already before the command. Is it due to my start-end-criteria?

Before screen.seqs # 230330
After screen.seqs #2632

mothur > summary.seqs(fasta=HX1JDSX01.shhh.trim.align, name=HX1JDSX01.shhh.trim.names,processors=20)

Using 20 processors.

  Start End NBases Ambigs Polymer NumSeqs
Minimum: 0 0 0 0 1 1
2.5%-tile: 1044 5443 248 0 3 5759
25%-tile: 1044 6091 256 0 4 57583
Median:  1044 6109 273 0 5 115166
75%-tile: 1044 6202 279 0 5 172748
97.5%-tile: 1079 6409 296 0 6 224572
Maximum: 43115 43116 309 0 8 230330
Mean: 1137.62 6131.41 269.069 0 4.44837
# of unique seqs: 89050
total # of seqs: 230330

mothur > summary.seqs(fasta=HX1JDSX01.shhh.trim.good.align, name=HX1JDSX01.shhh.trim.good.names)

Using 1 processors.

  Start End NBases Ambigs Polymer NumSeqs
Minimum: 1044 6409 276 0 4 1
2.5%-tile: 1044 6409 287 0 4 66
25%-tile: 1044 6411 290 0 5 659
Median:  1044 6418 292 0 5 1317
75%-tile: 1044 6420 293 0 5 1975
97.5%-tile: 1044 6424 295 0 5 2567
Maximum: 1044 6447 302 0 7 2632
Mean: 1044 6415.7 291.258 0 4.98594
# of unique seqs: 1346
total # of seqs: 2632

Thanks,

What you probably want is either…


screen.seqs(fasta=HX1JDSX01.shhh.trim.align, name=HX1JDSX01.shhh.trim.names, group=HX1JDSX01.shhh.groups, optimize=start, end=6091, criteria=95, processors=20)

or

screen.seqs(fasta=HX1JDSX01.shhh.trim.align, name=HX1JDSX01.shhh.trim.names, group=HX1JDSX01.shhh.groups, optimize=end, start=1044, criteria=95, processors=20)

If you sequenced from the 5’ end I’d use the latter and if you sequenced from the 3’ end the former. Remember the start/end parameter means that the sequence starts at or before that position.

Pat

Thanks Pat!