filter.seqs removes every column

We have sequenced 16S amplicons via Illumina platform and are now analysing our sequences. In filter.seqs step every column is removed. I have tried different criterias in screen.seqs, but it still ends the same. For now I have only used vertical=T parameter, but it would be great to remove dots also. Is there anything that I should try or keep in mind?


Can you post the results of summary.seqs for the input to screen.seqs?

My input to screen.seqs:

Start End NBases Ambigs Polymer NumSeqs
Minimum: 0 0 0 0 1 1
2.5%-tile: 1044 1056 5 0 2 1045
25%-tile: 21917 22545 48 0 3 10446
Median: 31189 34102 74 0 3 20891
75%-tile: 31189 34113 75 0 3 31336
97.5%-tile: 42573 43061 77 0 4 40737
Maximum: 43116 43116 128 0 8 41781
Mean: 26207.9 28138.7 60.867 0 3.17367

of Seqs: 41781

95% of the sequences are between 74-76 bases, so these are quite short and I’m not so sure that aligning against Silva reference alignment is the right choice here. Maybe Greengenes would be better?

The problem is that your sequences do not overlap with each other - I doubt greengenes will be better. I’d suggest using start=31189, end=34000 in screen.seqs and then running filter.seqs.

Thank you for the advice, it helped a lot. But another question, what reference database for classification would be the best to use for so short sequences?

Give the RDP trainset or the greengenes one a shot. Just don’t expect them to classify your data all the way to family or genus.