screen.seqs leaves a few wrong sequences?

Hi, this post comes with the usual disclaimer: I’m a newbie and I’m doing my best to figure things out.

My question concerns the screen.seqs command.

I ran these two commands:

screen.seqs(fasta=allsamples.unique.align, start=6428, end=23444)
summary.seqs(fasta=allsamples.unique.good.align, processors=1)

I expected the summary to provide data concerning sequences starting at alignment position 6428 and ending at 23444 (i.e. what i had screened for), and that did mostly happen. There are just a few other sequences hanging around… here is the output:


Using 1 processors.

Start End NBases Ambigs Polymer NumSeqs
Minimum: 6115 23444 377 0 3 1
2.5%-tile: 6426 23444 402 0 4 3043
25%-tile: 6428 23444 404 0 5 30428
Median: 6428 23444 409 0 5 60855
75%-tile: 6428 23444 429 0 6 91282
97.5%-tile: 6428 23444 430 0 7 118666
Maximum: 6428 25298 450 0 21 121708
Mean: 6425.29 23444 415.732 0 5.14926

of Seqs: 121708

Output File Names:
allsamples.unique.good.summary

It took 156 secs to summarize 121708 sequences.

Does anyone know what why there are sequences starting at 6115 in the “Minimum” line?
or why there are sequences ending at 25298 in the “Maximum” line?

Thanks for any help, criticism, feedback.

Hi Carey,

screen.seqs is actually looking for sequences that start at or before 6428 and end at or after 23444. So your results make sense. When you do filter.seqs with trump=T those extra columns will go away.

Pat