align output

Hi Everyone,
Could anyone explain me or direct to some paper that explains the difference between aligned and unaligned database. Also, could anyone explain me the output of align.seqs() command.
I have this output








…T-------AC—GT-AG-GGT------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------------------------------------------------------------------------GCG
-A-G---------------------------------------------------------------------------------------------------------------------------------------------------------



---------------------------C–G--T—T–AA-T-CGG-AA------TT-A–C-T–GG-GC------------------------------------------------------------------------------------



--------------------------------------------------------------------------------------GT–A-----AA-GC-GT-GC-------G-CA-G-G-C-G---------------G–T-TA-T-A-T—

-----------------AA----G-A-C-A-----------------------------------------------------G-T-T–G--TG–A-AA-TC–C-C-CG-G-G-----------------------------------------
-----------------------------------------------------------------------------------------------------CT-C-AA-------------------------------------------------
---------------------------------------------------------------------------------------------------------------C-C-T-G-G-G-A–A-T----T-G–C-A—T–C---------
-------------------T–GT-G-A—C----------------------------------------------------T–G-T–AT–A-G-C--------------------------------------------------------






























------------------------------------------------------------------------------T-A-G-A-G-T–A-----C-GG------TA-G-A---------------------G-G-G-G—GA-T---------

-------------------------------------------------------------------------------GG–A--ATT--------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------C-C-G-C-GT…






There are more dots at the starting and at the end, I have not provided all due to space.
How there are so many hyphens in between.
Thanks for any help!!!

dots=beginning and end of sequence / missing data
dashes = gaps in alignment

the 50k column alignment is so that 16S and 18S can be compared on the same alignment with plenty of room for padding.

Thanks Dr. Schloss,
Why the is not a stretch of sequence match like this



…TACGTAGGGTGCGAGCGTTAATCGGAATTACTGGGCGTAAAGCGTGCGCAGGCGGTTATATAAGACAGTTGTGAAATCCCCGGGCTCAACCTGGGAATTGCATCTGTGACTGTATAGCTAGAGTACGGTAGAGGGGGATGGAATTCCGCGT…





in the align.report file I am getting 100% similarity b/w querry and template, then why so many gaps in between bases like
---------------------------C–G--T—T–AA-T-CGG-AA------TT-A–C-T–GG-GC------------------------------------------------------------------------------------

Thanks!!!

Because it’s an alignment… To preserve positinoal homology across all sequences you have to insert gaps.

Thanks Dr. Schloss,
I am sorry but I am still not able to understand. As far as I understand if there is 100 % similarity the it should be like this
TACGTAGGGTGCGAGCGTTAATCGGAATTACTGGGCGTAAAGCGTGCGCAGGCGGTTATATAAGACAGTTGTGAAATCCCCGGGCTCAACCTGGGAATTGCATCTGTGACTGTATAGCTAGAGTACGGTAGAGGGGGATGGAAT

but why there are so many gaps.

It would be great if you could provide some reference so that I am able to get better. I am a biology student and first time dealing with bioinformatics.


Thanks for all the help!!!!

Because there’s more than two sequences in the world :). I’d encourage you to read up on multiple sequence alignment. The gaps are there because of the tremendous genetic diversity in the 16S gene across bacteria.