summary max start and end at the same place?

So I just aligned a bunch of V6 16s sequences already characterized by VAMPS. I have two problems: firstly, upon alignment, the prompt gives me the message that some of my generated sequences eliminated too many bases. How is this if they are already characterized and mapped to V6? I went back to the accnos to check the sequences of those samples that were aligned this way and they seemed fine.

Futhermore, when I do the summary.seqs of the align, I get a minimum of start 0 end 0, and a maximum which also starts and ends at the same number. But this again doesn’t seem to make a lot of sense given that all of these sequences have already been characterized by vamps.

thanks in advance,

Jon

Can you try flip=T? I wonder whether the sequences are backwards for some reason. Alternatively, can you post one of the putative V6 reads and we can take a look.

I tried flip T, and I’m still getting this:

Start End NBases Ambigs Polymer NumSeqs
Minimum: 1044 1044 1 0 1 1
2.5%-tile: 31189 32828 56 0 3 2615
25%-tile: 31189 33183 60 0 3 26150
Median: 31189 33183 60 0 3 52300
75%-tile: 31189 33183 63 0 4 78450
97.5%-tile: 31189 33289 71 0 5 101985
Maximum: 43116 43116 74 0 8 104599
Mean: 31195.3 33142.8 60.5491 0 3.51607

of Seqs: 104599

any ideas?

for reference this is what the sum seq looked like prior to alignment

Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 51 51 0 2 1
2.5%-tile: 1 57 57 0 3 2615
25%-tile: 1 60 60 0 3 26150
Median: 1 60 60 0 3 52300
75%-tile: 1 64 64 0 4 78450
97.5%-tile: 1 71 71 0 5 101985
Maximum: 1 74 74 0 10 104599
Mean: 1 61.6481 61.6481 0 3.53819

of Seqs: 104599

and here’s a V6 read:

AAAACTTGACATCTACGGCAAAGCTATGGAAGTGTAGTGGAGGTTAACCGTAAGAC

Aligning that sequence against silva.bacteria.fasta, the alignment was 80.36% similar to GenBank accession D78648, which is a Ureaplasma. I get the following from summary.seqs…

Start End NBases Ambigs Polymer NumSeqs
Minimum: 31189 33183 56 0 4 1
2.5%-tile: 31189 33183 56 0 4 1
25%-tile: 31189 33183 56 0 4 1
Median: 31189 33183 56 0 4 1
75%-tile: 31189 33183 56 0 4 1
97.5%-tile: 31189 33183 56 0 4 1
Maximum: 31189 33183 56 0 4 1
Mean: 31189 33183 56 0 4

It didn’t have to be flipped to get this and there wasn’t a warning. Do you have another sequence that produces the warning on your end?

Pat

So I didn’t see the thread two topics below which asks the same question. Sorry to flood the board.

I screen.seqs the ones that began at 31189, and took a look at the bad.accnos file. There are some of the sequences I came up with:

AAGGCTTGACATGCTAGAAGTAGAAACTCGAAAGAGGGACGATCTGTATCCAATCAGGAGTTAGCAC

AAGGCTTGACATGTAACTGCTAATCCTGTGAAAGCAGGATTCCTTTGAGGGTGTTACAC

AAGGCTTGATATGTCGGAAGTAGCGACCTGAAAGGGAAGCCACCTGTTGAGTCAGGAACCGTCAC

just the top three. thanks for your help.

I guess I should have thought of this initially, but I failed to remember (PTSD, repressed memories, etc.) that V6 is really wacky. At times it can be very difficult to align because it has such a high rate of evolution. So the silva.bacteria.fasta database may be limited in the taxa represented by these fragments. The other side of the coin is that they’re so wacky that you’d never know whether they’re garbage or real sequence fragments. I believe VAMPS was/is doing pairwise alignments, which will align anything together even if it isn’t truly appropriate to align the fragments. Looking back at your original data with a start=31189, end =33183, you should still have a ton of data left over to work with.

Hope this was helpful…
Pat

Makes sense, thanks so much. The thing though is that this is a poorly sampled environment (deep ocean) and within some of the poorly aligned sequences are reads that are pretty abundant (up to 17 or so percent of the total reads), which leads me to believe that they aren’t artifacts. When nblasted, they map to 16S, but many to unknown orders, unknown phyla, etc–which of course is what we’re looking for. How would you go about this problem?

thanks again

One thing that I wish Sogin et al. would have done to really bolster the idea of the rare (or not so rare) biosphere is to design primers to these novel V6 tags and then sequence back to a conserved primer using Sanger or another technology. Then you’d have a lot more data that you could use to make taxonomic assignments. In the end, I think names are mostly made up and tell us very little about function - I’m very content talkibn about OTUX - of course I get that most others aren’t!