align output

newbie · October 26, 2012, 6:17pm

Hi Everyone,
Could anyone explain me or direct to some paper that explains the difference between aligned and unaligned database. Also, could anyone explain me the output of align.seqs() command.
I have this output
…
…
…
…
…
…
…
…
…T-------AC—GT-AG-GGT------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------------------------------------------------------------------------GCG
-A-G---------------------------------------------------------------------------------------------------------------------------------------------------------

---------------------------C–G--T—T–AA-T-CGG-AA------TT-A–C-T–GG-GC------------------------------------------------------------------------------------

--------------------------------------------------------------------------------------GT–A-----AA-GC-GT-GC-------G-CA-G-G-C-G---------------G–T-TA-T-A-T—

-----------------AA----G-A-C-A-----------------------------------------------------G-T-T–G--TG–A-AA-TC–C-C-CG-G-G-----------------------------------------
-----------------------------------------------------------------------------------------------------CT-C-AA-------------------------------------------------
---------------------------------------------------------------------------------------------------------------C-C-T-G-G-G-A–A-T----T-G–C-A—T–C---------
-------------------T–GT-G-A—C----------------------------------------------------T–G-T–AT–A-G-C--------------------------------------------------------

------------------------------------------------------------------------------T-A-G-A-G-T–A-----C-GG------TA-G-A---------------------G-G-G-G—GA-T---------

-------------------------------------------------------------------------------GG–A--ATT--------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------C-C-G-C-GT…
…
…
…
…
…
…
There are more dots at the starting and at the end, I have not provided all due to space.
How there are so many hyphens in between.
Thanks for any help!!!

pschloss · October 26, 2012, 8:11pm

dots=beginning and end of sequence / missing data
dashes = gaps in alignment

the 50k column alignment is so that 16S and 18S can be compared on the same alignment with plenty of room for padding.

newbie · October 26, 2012, 8:22pm

Thanks Dr. Schloss,
Why the is not a stretch of sequence match like this
…
…
…
…TACGTAGGGTGCGAGCGTTAATCGGAATTACTGGGCGTAAAGCGTGCGCAGGCGGTTATATAAGACAGTTGTGAAATCCCCGGGCTCAACCTGGGAATTGCATCTGTGACTGTATAGCTAGAGTACGGTAGAGGGGGATGGAATTCCGCGT…
…
…
…
…
…
in the align.report file I am getting 100% similarity b/w querry and template, then why so many gaps in between bases like
---------------------------C–G--T—T–AA-T-CGG-AA------TT-A–C-T–GG-GC------------------------------------------------------------------------------------

Thanks!!!

pschloss · October 26, 2012, 8:24pm

Because it’s an alignment… To preserve positinoal homology across all sequences you have to insert gaps.

newbie · October 26, 2012, 9:07pm

Thanks Dr. Schloss,
I am sorry but I am still not able to understand. As far as I understand if there is 100 % similarity the it should be like this
TACGTAGGGTGCGAGCGTTAATCGGAATTACTGGGCGTAAAGCGTGCGCAGGCGGTTATATAAGACAGTTGTGAAATCCCCGGGCTCAACCTGGGAATTGCATCTGTGACTGTATAGCTAGAGTACGGTAGAGGGGGATGGAAT

but why there are so many gaps.

It would be great if you could provide some reference so that I am able to get better. I am a biology student and first time dealing with bioinformatics.

Thanks for all the help!!!!

pschloss · October 27, 2012, 8:32am

Because there’s more than two sequences in the world :). I’d encourage you to read up on multiple sequence alignment. The gaps are there because of the tremendous genetic diversity in the 16S gene across bacteria.

Topic		Replies	Views
How does align.seq's SimBtwnQuery&Template handle gaps? Theory behind mothur	2	614	April 25, 2019
Alignment seqs of different lengths? Commands in mothur	4	4349	April 6, 2010
Align.seqs - queries. Commands in mothur	4	5844	January 30, 2014
align.seqs question Commands in mothur	1	3209	July 1, 2010
Alignment in V4 region Theory behind mothur	2	572	June 2, 2022

align output

---------------------------C–G--T—T–AA-T-CGG-AA------TT-A–C-T–GG-GC------------------------------------------------------------------------------------

--------------------------------------------------------------------------------------GT–A-----AA-GC-GT-GC-------G-CA-G-G-C-G---------------G–T-TA-T-A-T—

------------------------------------------------------------------------------T-A-G-A-G-T–A-----C-GG------TA-G-A---------------------G-G-G-G—GA-T---------

Related topics