loss of sequences during screen.seqs

Hi all,

I prepared an alignment of 16S 454 data of relatively long PCR products which were sequenced from both ends (using Roche/454 GS-FLX+ Titanium), the reads overlap in a central region of the 16S. So, approx. half of the sequences start at the 5’-end (alignment column 1044) and the other half from the 3’-end (alignment column 28464). When using screen.seqs with the “start” or “end” options, I always loose a lot (roughly half) of sequences, because I can either keep those sequences from the beginning (starting eg. at column 1044) or those from the other side (ending at column 28464). My question is whether it is possible to run screen.seqs in such a way that you can keep sequences that overlap in a certain region (ideally I would like to keep either only e.g. alignment columns 5000 until 15000, or sequences that cover this region)? Any ideas or suggestions?

Thanks! Stephan

Hi Chris,

So start=5000 will keep anything that starts before 5000 and end=15000 will keep anything that ends after 15000. Then if you run filter.seqs(vertical=T, trump=.) you’ll “bookend” everything to only overlap that region. If that doesn’t work, I would increase your start value and decrease your end value until you’re keeping as many sequences as you want. Of course as you increase the number of sequences you decrease the length.

Also, for what it’s worth, your approach is not “standard” - generally people only sequence in one direction because of the problems you’re probably encountering in analyzing your data.

Pat