We have sequenced 16S amplicons via Illumina platform and are now analysing our sequences. In filter.seqs step every column is removed. I have tried different criterias in screen.seqs, but it still ends the same. For now I have only used vertical=T parameter, but it would be great to remove dots also. Is there anything that I should try or keep in mind?
95% of the sequences are between 74-76 bases, so these are quite short and I’m not so sure that aligning against Silva reference alignment is the right choice here. Maybe Greengenes would be better?
The problem is that your sequences do not overlap with each other - I doubt greengenes will be better. I’d suggest using start=31189, end=34000 in screen.seqs and then running filter.seqs.
Thank you for the advice, it helped a lot. But another question, what reference database for classification would be the best to use for so short sequences?