Problem with filter.seqs

Hi,

I´m working on pyrosequencing 16S data. I´m having a problem with filter.seqs output

Here´s the summary after each step:


before the aligment
Start End NBases Ambigs Polymer NumSeqs Minimum: 1 20 20 0 1 1 2.5%-tile: 1 66 66 0 3 6054 25%-tile: 1 552 552 0 5 60538 Median: 1 701 701 0 5 121075 75%-tile: 1 751 751 0 6 181612 97.5%-tile: 1 775 775 1 7 236095 Maximum: 1 909 909 8 32 242148 Mean: 1 613.308 613.308 0.0839734 5.29897 # of Seqs: 242148
After aligment using: align.seqs(fasta=all.fasta, reference=silva.seed_v119.align, processors=8, flip=t)

Start End NBases Ambigs Polymer NumSeqs
Minimum: -1 -1 0 0 1 1
2.5%-tile: 6212 8357 19 0 2 6054
25%-tile: 6214 34102 536 0 5 60538
Median: 6237 35131 699 0 5 121075
75%-tile: 8190 35134 750 0 6 181612
97.5%-tile: 32825 35139 775 1 7 236095
Maximum: 43116 43116 847 8 32 242148
Mean: 9306.95 32270.2 598.656 0.0821811 5.21383

of Seqs: 242148

After screening using: screen.seqs(fasta=all.align, optimize=end, optimize=start, processors=4)

Start End NBases Ambigs Polymer NumSeqs
Minimum: -1 -1 0 0 1 1
2.5%-tile: 6212 8189 77 0 4 5449
25%-tile: 6214 32520 600 0 5 54487
Median: 6217 34479 723 0 5 108974
75%-tile: 6413 35134 752 0 6 163460
97.5%-tile: 14965 35135 776 1 7 212498
Maximum: 15968 38339 847 8 32 217946
Mean: 7084.17 31838.3 644.141 0.0863975 5.39379

of Seqs: 217946


After filtering using: filter.seqs(fasta=all.good.align, vertical=T, trump=., processors=2)

I get this message:

Length of filtered alignment: 0
Number of columns removed: 50000
Length of the original alignment: 50000
Number of sequences used to construct filter: 217946

and the summary is:

Start End NBases Ambigs Polymer NumSeqs
Minimum: -1 -1 0 0 1 1
2.5%-tile: 0 0 0 0 1 5449
25%-tile: 0 0 0 0 1 54487
Median: 0 0 0 0 1 108974
75%-tile: 0 0 0 0 1 163460
97.5%-tile: 0 0 0 0 1 212498
Maximum: -1 -1 0 0 1 217946
Mean: 0 0 0 0 1

of Seqs: 217946

I tested screen.seqs with others parameters, starting with 35139, ending with 32825, minlength=700 but without success. Ideas or others suggestions…?

Hi there,

The problem is your setting for screen.seqs. You have a “.” character in every column of your alignment. To get around this you want to do this:

screen.seqs(fasta=all.align, optimize=end, optimize=start, processors=4, criteria=95)


If you know something more about the region you sequenced, you could probably be more specific. For example...

screen.seqs(fasta=all.align, end=32520, optimize=start, criteria=95, processors=4)

Pat