I’ve been working on analyzing low biomass samples that I sequenced V1-3 on 454. I noticed that after running the alignment that the sequences aligned in 3 difference places .
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1044 1046 2 0 1 1
2.5%-tile: 1044 1107 13 0 2 79
25%-tile: 5286 13862 31 0 2 781
Median: 5302 13862 286 0 4 1561
75%-tile: 42635 43116 292 0 4 2341
97.5%-tile: 43058 43116 297 0 5 3043
Maximum: 43103 43116 308 0 8 3120
Mean: 15889.8 21156.4 186.295 0 3.50577
of unique seqs: 946
total # of seqs: 3120
If I look at the summary file, most of the sequences in the middle range are between 250 and 300 bases. The others are much shorter. By running screen.seqs with minlength=250, I only get the sequences in the middle range, but I’m still wondering if there is something else that has gone wrong.
- Has anyone else had data that aligned in several different places?
- For the different variable regions, is there an expected range they should fall into for alignment (i.e. start and end values)?