filter.seqs query

Hey there,
i’m following the costello analysis example.
I’ve reached the filter.seqs then unique.seqs step and after it, i’ve summary.seqs

Start End NBases Ambigs Polymer
Minimum: 1 473 89 0 3
2.5%-tile: 1 473 105 0 3
25%-tile: 1 473 107 0 3
Median: 1 473 119 0 4
75%-tile: 1 473 123 0 4
97.5%-tile: 1 473 135 0 5
Maximum: 1 473 283 0 8

of unique seqs: 14885

total # of seqs: 41974

my concern is that there is such a large range for NBases. I’ve analyzing V1,V2ish region sequencing from the forward primer (8/27F)

Shaun

So the V1 regions does have some known length heterogeneity, but the 283 does seem weird. Have you tried to fish that sequence out and see who it is? Some groups like TM7 do have long insertions/introns in the V1 region. It could also be that the sequence is garbage.