filter.seqs

I think a loose a lot of lenght in the filter.seqs command.
This is the result of the summary.seqs before screen.seqs
start end Nbases ambig polymer numseqs
minimum 1044 1046 1 0 1 1
2,5% title 6237 16427 249 0 4 356
25% title 6428 22612 403 0 5 3552
median 6428 25298 406 0 5 7103
75% title 6430 25298 423 0 5 10654
97,5% title 7193 25298 430 0 6 13850
maximum 43116 43116 497 0 7 14205
mean 6633,32 24105,4 401,895 0 5,03464
unique seqs 11761
total seqs 14205

then I run screen.seqs with the parameters optimize=start-minlength, criteria=85 and I get:

start end Nbases ambig polymer numseqs
minimum 5707 22091 391 0 4 1
2,5% title 6232 22580 395 0 4 258
25% title 6428 25298 404 0 5 2575
median 6428 25298 407 0 5 5149
75% title 6428 25298 423 0 5 7723
97,5% title 7192 25298 431 0 6 10040
maximum 7192 26918 497 0 8 10297
mean 61641,64 413,055
unique seqs 8440
total seqs 10297

will here ok. Then I run screen.seqs(fasta=current, vertical=T, trump=.) and then I get
start end Nbases ambig polymer numseqs
minimum 1 780 304 0 3 1
2,5% title 1 784 312 0 4 258
25% title 24 784 323 0 4 2575
median 24 784 324 0 5 5149
75% title 24 784 342 0 5 7723
97,5% title 25 784 359 0 6 10040
maximum 27 784 371 0 8 10297
mean 22,0251 783,993 329,348
unique seqs 8440
total seqs 10297

the mean Nbases has reduced from 413 to 329 Pb. is this normal? What’s happening?

Thanks in advance.

why not try…

screen.seqs(fasta=, group=, name=, optimize=start, criteria=85, end=25298)

because sequences vary in their number of insertions and deletions within the same alignment region, sequence length and alignment length aren’t necessarily the same thing.

Pat

Thank you.
I will try that, but I really don’t like much because I was sequencing from the forward primers, so sequences should start at the same pont, not end at the same point.
Anyway, there is something I don’t understand. Theorically, when I run filter.seqs(fasta=, vertical=T, trump=.) I am eliminating just columns with a “-” in all the sequences and the columns that present a "."in some sequence.By this I understand that I am not reducing the number of Nbases, just the length of the alignment, but it reduces in more than 100pb the number of bases.

By the way, when is a “.” inserted in a sequence, I read that it means missing data, but I don’t really understand the meaning.

Thank you.

Hi Anna,

You can choose your start position instead of your end position, e.g. ‘start=19854’ instead of ‘end=25298’.

The ‘.’ means that it it not known what nucleotide/gap follows or precedes the sequence. By removing all columns with a period (which should only be found at the ends of sequences), you will inevitably chop off portions from the sequences with longer lengths.

Cheers,
Stephen