align.seqs shortens some pyrosequences

Hi Pat,
I am using Mothur to analyze microbial diversity in mangrove sediment samples. I noticed that the command “align.seqs” shortens considerably some sequences (a few thousands out of 106000).
For example, in my all.trim.unique.fasta file, one sequence is AGTCCGGCTACCCATCAGAGCCTTGGTGAGCCGTTACCTCACCAACAAGCTAATAGGACATAGGCCGCTCCCCGGGCAGAGGGTTGCCCGACCGTTTACACTTCGGAAGATGCCATCCGAGGTGACCATCCGGTATTACCTGCCGTTTCCAGCAGCTATTCCGGTCCCGAGGGTACGTTGCCTATGTATTACTCACCCTTTCGCCGCTCTCCAGCACCCCGAAGGATGCCTTCGCGCTCGACTTGCATGCCTAAACCACGCCGCCAGCGTTCACT
(275 bp in total)
After alignment against the SILVA database (uploaded from the MOTHUR website):

align.seqs(candidate=all.trim.unique.fasta, template=silva.bacteria.fasta, flip=t, processors=2)

the same sequence looks like this (in the all.trim.unique.align file) (12 bp in total)
…CCAGCGTTCACT…

what is happening here? Should I use another reference database?
thanks for your help,
isabelle

I suspect these are mostly garbage sequences. If you take that sequence and blast it against the GenBank nt database (exclude uncultureds) you’ll get a bunch of awful alignments. The top match is only 76% identical to some members of the Planctomycetales at the 5’ end of your sequence. While it’s entirely possible that you have a new domain of life, given your alignment, it is unlikely. Also, when I blasted the last 200 bp of your sequence there were no significant matches. I’d remove these bad aligners using screen.seqs and move on without worrying.

Pat