Screen.seqs with different start-end positions

Hi all

I have a problem with finding the start and end position for screen.seqs after alignment (I’ve used the silva alignment database). I’m following the 454 SOP, and I’ve sequenced the V6-V9 region. I went through the Sogin example and noticed that they skipped the screen.seqs step and immediately run the filter.seqs command. Will this also be better for my dataset?

summary.seqs(fasta=IB90NV007.MID1.site_1.trim.unique.align)

Start End NBases Ambigs Polymer NumSeqs
Minimum: 1044 5292 185 0 3 1
2.5%-tile: 1784 10358 267 0 4 113
25%-tile: 21991 37426 405 0 4 1130
Median: 28460 41444 416 0 5 2260
75%-tile: 31187 42568 421 0 5 3389
97.5%-tile: 35131 43116 430 0 7 4406
Maximum: 40873 43117 555 0 8 4518
Mean: 24894.2 37640 405.431 0 4.74989

of Seqs: 4518

Thank you
K.

Hi Karen,

You don’t really want to follow the Sogin example, that’s very dated at this point. Do you know which direction the sequencing was done from? Did they start at V9 and work back to V6 or V6 towards V9? I suspect you want to set end to equal something like 37000 and then optimize for the start position. Regardless, it is a bit weird that they don’t all start (or end) at a similar coordinate. Do you have more details on what was done?

Pat

Hi Pat

Thank you for your reply. Sequencing was done from V6 to V9. I think I made a mistake with my oligos file so the trim.seqs and downstream commands gave the weird results. I ran everything again this morning, and summary.seqs after alignment (with the silva database and flip=T) looks much better. Can I use 31189 as the start position and optimize the end?

mothur >
summary.seqs(fasta=IB90NV007.MID1.site_1.trim.unique.align, name=IB90NV007.MID1.site_1.trim.names)

Using 2 processors.

Start End NBases Ambigs Polymer NumSeqs
Minimum: 1044 1048 3 0 2 1
2.5%-tile: 31189 40863 300 0 4 161
25%-tile: 31189 42537 412 0 4 1602
Median: 31189 42537 417 0 5 3204
75%-tile: 31189 42546 421 0 5 4806
97.5%-tile: 32398 42546 427 0 6 6247
Maximum: 43103 43116 501 0 8 6407
Mean: 31232.4 42338.6 409.548 0 4.78914

of unique seqs: 4327

total # of seqs: 6407

Thank you
Karen

Yep, that looks right