Hi,
I’m hoping for some input on the settings for screen.seqs.
I’ve run align.seqs (fasta=merged.shhh.trim.unique.fasta, reference=silva.bacteria/silva.bacteria.fasta, flip=t, processors=2) and get
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1044 4755 121 0 3 1
2.5%-tile: 6428 21918 323 0 4 7803
25%-tile: 6428 25511 449 0 4 78021
Median: 6428 25513 450 0 5 156041
75%-tile: 6428 26149 454 0 5 234061
97.5%-tile: 6428 26158 461 0 5 304279
Maximum: 41774 43116 477 0 8 312080
Mean: 6421.78 25472.5 442.916 0 4.7173
of unique seqs: 35063
total # of seqs: 312080
I first screened with screen.seqs(fasta=merged.shhh.trim.unique.align, name=merged.shhh.trim.names, group=merged.shhh.groups, start=6428, minlength=220, processors=2) and got
Start End NBases Ambigs Polymer NumSeqs
Minimum: 0 0 0 0 1 1
2.5%-tile: 6428 21976 336 0 4 6850
25%-tile: 6428 25511 449 0 4 68498
Median: 6428 25513 450 0 5 136995
75%-tile: 6428 26149 454 0 5 205492
97.5%-tile: 6428 26162 460 0 5 267139
Maximum: 10398 131077 1374 0 8 273988
Mean: 6406.94 25509.4 444.016 0 4.69926
of unique seqs: 24819
total # of seqs: 273988
I tried to get rid of the oddly long seqs in different ways and based on another post in the forum this seems to do the trick:
screen.seqs(fasta=merged.shhh.trim.unique.align, name=merged.shhh.trim.unique.names, group=merged.shhh.groups, start=6428, end=25511, minlength=220, processors=2)
Start End NBases Ambigs Polymer NumSeqs
Minimum: 6426 25511 423 0 4 1
2.5%-tile: 6428 25511 447 0 4 6309
25%-tile: 6428 25511 449 0 4 63082
Median: 6428 26147 450 0 5 126164
75%-tile: 6428 26149 454 0 5 189246
97.5%-tile: 6428 26162 457 0 5 246019
Maximum: 6428 26855 477 0 8 252327
Mean: 6410.42 25839 450.919 0 4.72901
of unique seqs: 10345
total # of seqs: 252327
Are these settings and the resulting data ok or are there other things I might try?
Thank you, Sandra