Hello
I have another question about silva.seed_v119.align (I’m on Mothur 1.33.3).
For the pcr.seqs step, I know my primers are at position 23447 and 43116 so here is what I did:
pcr.seqs(fasta=silva.seed_v119.align, start=23447, end=43116, keepdots=F, processors=1)
summary.seqs(fasta=silva.seed_v119.pcr.align)
and I get this:
Start End Nbases Ambigs Polymer NumSeqs
Minimum 1 19356 661 0 4 1
2.5%-tile 511 19669 985 0 4 376
25%-tile 511 19669 702 0 5 3753
Median 511 19669 704 0 5 7505
75%-tile 511 19669 709 0 6 11257
97.5%-tile 511 19669 760 2 7 14634
Maximum 1985 19669 1450 5 10 15009
Mean 510.997 19669 712.157 0.17243 5.36984
of seqs 15009
My questions are:
-
why is it that most sequences start at 511 and only a minority start at 1? It could be explained if there were some remaining dots but there shouldn’t be (keepdots=F).
-
should I just trim my new DB of anything that starts before 511 by doing: pcr.seqs(fasta=silva.seed_v119.pcr.align, start=511, end=43116, keepdots=F, processors=1) and trust that I can use the resulting DB or is there something wrong with the DB?
In order to understand what was going on I also did this (using the original silva.seed_v119.align downloaded from the SOP):
pcr.seqs(fasta=silva.seed_v119.align, start=20000, end=43116, keepdots=F, processors=1)
and I got this:
Start End Nbases Ambigs Polymer NumSeqs
Minimum 45 20803 729 0 4 1
2.5%-tile 45 21116 753 0 4 376
25%-tile 46 21116 770 0 5 3753
Median 46 21116 772 0 5 7505
75%-tile 46 21116 777 0 6 11257
97.5%-tile 46 21116 828 0 7 14634
Maximum 47 21116 1579 2 10 15009
Mean 46.00001 21116 780.2 5 5.37278
of seqs 15009 0.18842
Question:
Why do I get different end positions depending on the specified start position and why does it start at 45 (and not 1) in this last command? Is it a normal inconsistency, am I missing something in the pcr.seqs process or there something weird with the DB?
Thanks a lot for any answer!
Best