Hi,
I am having some trouble when running screen.seqs and filter.seqs on my 16S seqs after alignment to the SILVA reference alignment.
Here is the summary of my alignment:
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1044 1103 10 0 2 1
2.5%-tile: 1044 6332 313 0 4 5658
25%-tile: 1044 8508 397 0 5 56573
Median: 1044 9890 424 0 5 113145
75%-tile: 1044 9994 434 0 5 169717
97.5%-tile: 1044 10351 454 0 6 220631
Maximum: 43097 43116 485 0 8 226288
Mean: 1042.56 9323.6 410.339 0 5.00103
of unique seqs: 62445
total # of seqs: 226288
Then I ran screen.seqs as follows:
screen.seqs (fasta=2d.shhh.trim.pick.unique.align, name=2d.shhh.trim.pick.names, group=2d.shhh.groups, end=6332, minlength=300, processors=2)
Here is a summary of the output of the screen.seqs command:
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1044 6332 300 0 3 1
2.5%-tile: 1044 6388 317 0 4 1485
25%-tile: 1044 7941 370 0 5 14849
Median: 1044 9820 412 0 5 29698
75%-tile: 1044 9914 430 0 5 44546
97.5%-tile: 1044 10303 455 0 6 57910
Maximum: 3616 13875 485 0 8 59394
Mean: 1044.53 8966.52 399.971 0 5.03881
of Seqs: 59394
Next I ran filter.seqs with the trump command:
filter.seqs(fasta=2d.shhh.trim.pick.unique.good.align, vertical=T, trump=., processors=2)
Here is the summary of the filter.seqs output:
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 402 131 0 3 1
2.5%-tile: 1 404 141 0 3 1485
25%-tile: 1 404 151 0 4 14849
Median: 1 404 153 0 4 29698
75%-tile: 1 404 157 0 5 44546
97.5%-tile: 11 404 168 0 6 57910
Maximum: 26 404 194 0 8 59394
Mean: 2.87526 404 154.161 0 4.32683
of Seqs: 59394
As you can see, I am losing a lot of length after running the filter command! Is there any way I might improve on this so that I can keep around 300-400 bases of the alignment whilst not reducing my total no. of sequences too much? Any advice would be much appreciated and I apologise for the long post!
Many thanks