settings for screen.seqs

Hi,
I’m hoping for some input on the settings for screen.seqs.

I’ve run align.seqs (fasta=merged.shhh.trim.unique.fasta, reference=silva.bacteria/silva.bacteria.fasta, flip=t, processors=2) and get

Start End NBases Ambigs Polymer NumSeqs
Minimum: 1044 4755 121 0 3 1
2.5%-tile: 6428 21918 323 0 4 7803
25%-tile: 6428 25511 449 0 4 78021
Median: 6428 25513 450 0 5 156041
75%-tile: 6428 26149 454 0 5 234061
97.5%-tile: 6428 26158 461 0 5 304279
Maximum: 41774 43116 477 0 8 312080
Mean: 6421.78 25472.5 442.916 0 4.7173

of unique seqs: 35063

total # of seqs: 312080

I first screened with screen.seqs(fasta=merged.shhh.trim.unique.align, name=merged.shhh.trim.names, group=merged.shhh.groups, start=6428, minlength=220, processors=2) and got

Start End NBases Ambigs Polymer NumSeqs
Minimum: 0 0 0 0 1 1
2.5%-tile: 6428 21976 336 0 4 6850
25%-tile: 6428 25511 449 0 4 68498
Median: 6428 25513 450 0 5 136995
75%-tile: 6428 26149 454 0 5 205492
97.5%-tile: 6428 26162 460 0 5 267139
Maximum: 10398 131077 1374 0 8 273988
Mean: 6406.94 25509.4 444.016 0 4.69926

of unique seqs: 24819

total # of seqs: 273988

I tried to get rid of the oddly long seqs in different ways and based on another post in the forum this seems to do the trick:

screen.seqs(fasta=merged.shhh.trim.unique.align, name=merged.shhh.trim.unique.names, group=merged.shhh.groups, start=6428, end=25511, minlength=220, processors=2)

Start End NBases Ambigs Polymer NumSeqs
Minimum: 6426 25511 423 0 4 1
2.5%-tile: 6428 25511 447 0 4 6309
25%-tile: 6428 25511 449 0 4 63082
Median: 6428 26147 450 0 5 126164
75%-tile: 6428 26149 454 0 5 189246
97.5%-tile: 6428 26162 457 0 5 246019
Maximum: 6428 26855 477 0 8 252327
Mean: 6410.42 25839 450.919 0 4.72901

of unique seqs: 10345

total # of seqs: 252327

Are these settings and the resulting data ok or are there other things I might try?
Thank you, Sandra

…more specifically, I don’t understand why screening on minlenght and start position apparently introduces these long sequences (Maximum: 10398 131077 1374 0 8 273988) that were not there in the original alignment. Adding a maxlenght doesn’t remove them, I’ve only been managed to do this by adding the endposition but to me it looks like this only removes the shorter seqs that were otherwise fine.
I’ve probably simply misunderstood the command, any input is much appreciated!
Best, Sandra

Maximum: 10398 131077 1374 0 8 273988 seems odd. Would you mind sending your logfile and input files to mothur.bugs@gmail.com?

Also, you may want to adjust your end value so you don’t remove as many seqs.

screen.seqs(fasta=merged.shhh.trim.unique.align, name=merged.shhh.trim.unique.names, group=merged.shhh.groups, start=6428, end=26162, minlength=220, processors=2)

Thanks for the input!
I tried running screen.seqs(fasta=merged.shhh.trim.unique.align, name=merged.shhh.trim.unique.names, group=merged.shhh.groups, start=6428, end=26162, minlength=220, processors=2) but this eliminates even more seqs

Start End NBases Ambigs Polymer NumSeqs
Minimum: 6428 26162 442 0 4 1
2.5%-tile: 6428 26162 447 0 4 173
25%-tile: 6428 26162 449 0 4 1727
Median: 6428 26162 449 0 4 3453
75%-tile: 6428 26162 450 0 4 5179
97.5%-tile: 6428 26162 451 0 5 6733
Maximum: 6428 26855 476 0 7 6905
Mean: 6428 26161.4 449.393 0 4.12918

of unique seqs: 729

total # of seqs: 6905

I don’t really understand what’s going on, the logfiles look ok to me.
I’ve uploaded the files to the wiki, thank you very much for your help again!
Best, Sandra

Sorry for the confusion. You were right with end=25511. I will take a look at the strange maximum values.

great, thanks!