alignment problem?

When I align my pyrosequencing data, some of the sequences are (severely) truncated. for example, the 233bp sequence named FYV22AL01DK3AI becomes just 8 bp (with leading and trailing “.”) in the alignment, when using the basic align command: align.seqs(candidate=0.11B.fasta,template=silva.bacteria,ksize=9,processors=2)

When I align it with the RDP/Pyro aligner it looks…normal.

I’ll email the .fasta file to Pat for analysis.

The sequence FYV22AL01DGF7S gives the following info in the align.report file:

QueryName QueryLength TemplateName TemplateLength SearchMethod SearchScore AlignmentMethod QueryStart QueryEnd TemplateStart TemplateEnd PairwiseAlignmentLength GapsInQuery GapsInTemplate LongestInsert SimBtwnQuery&Template
FYV22AL01DGF7S 222 DQ663168.1 1492 kmer 6.05 needleman 1 10 1481 1492 12 2 0 0 75

So, the align.seqs recognizes it is a 222 bp sequence. But is uses a Query length of 10 bp

The alignment it returns is all periods except for this: AA-GG----G-C-CG-T–G (which is 10bp of non-gaps). Everything else in this sequence in the alignment is a period.

How do I interpret this?

I just noticed that the silva alignment is RNA sequences, but mine are DNA sequences (as are those returned by RDP). Could that be a problem?

I just replaced the U’s with T’s in the silva alignment, and the problem remains. At least one 222bp sequence becomes only 10bp long upon alignment.

So, it isn’t RNA versus DNA.

Here is the relevant line from the .report file:

QueryName QueryLength TemplateName TemplateLength SearchMethod SearchScore AlignmentMethod QueryStart QueryEnd TemplateStart TemplateEnd PairwiseAlignmentLength GapsInQuery GapsInTemplate LongestInsert SimBtwnQuery&Template
FYV22AL01DGF7S 222 DQ663168.1 1492 kmer 6.05 needleman 1 10 1481 1492 12 2 0 0 75

So, it is a 222bp query, but the length of the query actually used is 10bp.

Curiously, the 10bp used (and returned in the alignment) are the first 10 bp of the original sequence. I don’t see anything unusual in the next few bp in the original assignment.

James - I’m pretty sure that the sequences you’re trying to align are “backward”. The next version will give you the option to automatically flip a sequence and try again if this happens. For now, you might try running reverse.seqs on the sequences and re-aligning.

Yes, I think this is probably the problem.

Won’t reverse.seqs reverse ALL the sequences though? It looks like some of mine are reversed and some are not.

yeah it will - the next version of mothur will flip sequences and try to align those if they’ve been massively truncated. it looks like you have pyrotags - how did you get them in both directions?

yes indeed, these are pyrotags, at the 3’ end only. I don’t know how I got some forward and some backward. Perhaps that happened before I saw the data–someone else did the original QC.