Alignment

I’ve been working on analyzing low biomass samples that I sequenced V1-3 on 454. I noticed that after running the alignment that the sequences aligned in 3 difference places .

Start End NBases Ambigs Polymer NumSeqs
Minimum: 1044 1046 2 0 1 1
2.5%-tile: 1044 1107 13 0 2 79
25%-tile: 5286 13862 31 0 2 781
Median: 5302 13862 286 0 4 1561
75%-tile: 42635 43116 292 0 4 2341
97.5%-tile: 43058 43116 297 0 5 3043
Maximum: 43103 43116 308 0 8 3120
Mean: 15889.8 21156.4 186.295 0 3.50577

of unique seqs: 946

total # of seqs: 3120

If I look at the summary file, most of the sequences in the middle range are between 250 and 300 bases. The others are much shorter. By running screen.seqs with minlength=250, I only get the sequences in the middle range, but I’m still wondering if there is something else that has gone wrong.

  1. Has anyone else had data that aligned in several different places?
  2. For the different variable regions, is there an expected range they should fall into for alignment (i.e. start and end values)?

Were they all sequenced in one direction or in two directions? Generally the V1 forward primer picks up at 1044 and the V3 reverse primer ends off at 11892. So, I’m not sure what’s going on here…

Pat