mothur

Align.seqs SimBtwnQuery&Template


#1

Hi,

I am trying to understand why the align.seqs report file gives a 66% SimBtwnQuery&Template for a contig that has only one gap in reference. Here is the report line (with header)

QueryName	QueryLength	TemplateName	TemplateLength	SearchMethod	SearchScore	AlignmentMethod	QueryStart	QueryEnd	TemplateStart	TemplateEnd	PairwiseAlignmentLength	GapsInQuery	GapsInTemplate	LongestInsert	SimBtwnQuery&Template
RD1103_11953_7844	324	PSTVD_RG-L1-no-primers	323	kmer	97.79	needleman	1	324	1	323	324	0	1	1	66.56

Here is the alignment file between this contig and the reference (by Clustal Omega) :

PSTVD_RG-L1-no-primers      TTTTTCTCTATCTTACTTGCTCCGGGGCGAGGGTGTTTAGCCCTTGGAACCGCAGTTGGT	60
RD1103_11953_7844           TTTTTCTCTATCTTACTTGCTCCGGGGCGAGGGTGTTTAGCCCTTGGAACCGCAGTTGGT	60
                            ************************************************************

PSTVD_RG-L1-no-primers      TCCTCGGAACTAAACTCGTGGTTCCTGTGGTTCACACCTGACCTCCTGACAAGAAAAGAA	120
RD1103_11953_7844           TCCTCGGAACTAAACTCGTGGTTCCTGTGGTTCACACCTGACCTCCTGACAAGAAAAGAA	120
                            ************************************************************

PSTVD_RG-L1-no-primers      AAAAGAAGGCGGCTCGGAGGAGCGCT-TCAGGGATCCCCGGGGAAACCTGGAGCGAACTG	179
RD1103_11953_7844           AAAAGAAGGCGGCTCGGAGGAGCGCTTTCAGGGATCCCCGGGGAAACCTGGAGCGAACTG	180
                            ************************** *********************************

PSTVD_RG-L1-no-primers      GCAAAAAAGGACGGTGGGGAGTGCCCAGCGGCCGACAGGAGTAATTCCCGCCGAAACAGG	239
RD1103_11953_7844           GCAAAAAAGGACGGTGGGGAGTGCCCAGCGGCCGACAGGAGTAATTCCCGCCGAAACAGG	240
                            ************************************************************

PSTVD_RG-L1-no-primers      GTTTTCACCCTTCCTTTCTTCGGGTGTCCTTCCTCGCGCCCGCAGGACCACCCCTCGCCC	299
RD1103_11953_7844           GTTTTCACCCTTCCTTTCTTCGGGTGTCCTTCCTCGCGCCCGCAGGACCACCCCTCGCCC	300
                            ************************************************************

PSTVD_RG-L1-no-primers      CCTTTGCGCTGTCGCTTCGGCTAC	323
RD1103_11953_7844           CCTTTGCGCTGTCGCTTCGGCTAC	324
                            ************************

Why does the SimBtwnQuery&Template is so low for a single gap? If I were to screen these aligned sequences based on similarity (e.g. screen.seqs using align report) I would surely lose it, even with a loose cutoff of 70%.

Thanks for your help!

Phil


#2

Can you post those two sequences as they occur in the fasta and reference files you use in align.seqs? (also, i’ve never screened based on this column so your mileage may vary)