Hello,
We’re running mothur on 16S gene sequences produced on the Illumina platform. I’ve attempted to optimize my screen.seqs step, and I no longer run into the error where filter.seqs removes every column. However, the sequences that come out of the filter.seqs step are about half the length of those after screen.seqs. Below are my input commands and summaries after screen.seqs and filter.seqs:
screen.seqs(fasta=filename.trim.unique.align, name=filename.trim.names, group=filename.groups, minlength=75, start=40877, end=41432)
summary.seqs
Start End NBases Ambigs Polymer NumSeqs
Minimum: 40158 41432 75 0 2 1
2.5%-tile: 40339 41444 76 0 3 5644
25%-tile: 40727 41488 76 0 3 56431
Median: 40781 41488 79 0 3 112861
75%-tile: 40877 41547 79 0 3 169291
97.5%-tile: 40877 41562 79 0 5 220078
Maximum: 40877 42531 148 0 6 225721
Mean: 40825 41513 77.9236 0 3.25359
of unique seqs: 4979
total # of seqs: 225721
filter.seqs(fasta=filename.trim.unique.good.align, vertical=T, trump=.)
summary.seqs
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 57 22 0 2 1
2.5%-tile: 1 65 25 0 2 5644
25%-tile: 1 65 25 0 2 56431
Median: 1 65 28 0 2 112861
75%-tile: 1 65 28 0 3 169291
97.5%-tile: 1 65 33 0 3 220078
Maximum: 5 65 39 0 5 225721
Mean: 1.00314 64.9997 27.1742 0 2.43922
of unique seqs: 4979
total # of seqs: 225721
How could the filter step chop the sequences in half? Do the number of bases in the summary after screen.seq include blank columns (.) as well as actual bases? Is there a way I can screen out sequences that small in my screen.seqs step, if that’s the case? If it’s not the case, how do I keep my ~75-80bp sequences?
Thank you very much!
~Alexa