I am having difficulties figuring out why screen.seqs is completely removing >99% of my reads. As per the SOP, you run screen.seqs to get sequences that start at or before/after the start/end positions. However, instead of keeping these sequences, it is completely eliminating them.
I am using Mothur 1.44.3
Sequence after aligning to the reference database:
summary.seqs(fastacurrent, count=current)
Start End NBases Ambigs Polymer NumSeqs
Minimum: 7 10 1 0 1 1
2.5%-tile: 987 1665 121 0 4 75423
25%-tile: 1006 1668 122 0 4 754226
Median: 1007 1668 122 0 4 1508451
75%-tile: 1007 1672 123 0 4 2262676
97.5%-tile: 1028 1684 128 0 5 2941478
Maximum: 1718 1718 132 0 8 3016900
Mean: 1005 1669 122 0 4
# of unique seqs: 47954
total # of seqs: 3016900
screen.seqs(fasta=current, count=current, summary=current, start=1006, end=1684, maxhomop=8)
filter.seqs(fasta=current, vertical=T)
unique.seqs(fasta=current, count=current)
summary.seqs(fasta=current, count=current)
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 219 118 0 3 1
2.5%-tile: 17 219 123 0 3 558
25%-tile: 17 219 123 0 3 5579
Median: 17 219 123 0 3 11158
75%-tile: 18 219 131 0 4 16736
97.5%-tile: 18 219 131 0 5 21757
Maximum: 26 238 132 0 7 22314
Mean: 17 219 126 0 3
# of unique seqs: 1456
total # of seqs: 22314
So as you can see it removed almost 3 million sequences.
I have opened the .align file after the alignment to the reference database and the region between 987 - 1006 is a bit gappy for the majority of the sequences, but the sequences are good after that so I do not want to get rid of them. I would like to trim the database so all sequences start and end at the designated positions. I have tried running pcr.seqs and I get the exact same results. I have also tried to run scree.seqs with different start positions (e.g, 987, 1007) but that too deletes most of my database. I’ve looked into trim.seqs but that appears to only be used for trimming off adaptors/primers etc.
I am lost as to how to move forward without losing these sequences. Any insight would be greatly appreciated.