controlling alignment

Hi all,

I don’t know any better place on the forum to ask this question so pardon my enthousiasm :slight_smile:

I’d like to check the quality of my alignments. However, I wasn’t able to open the alignment using several programs (Mega, ClustalX2, BioEdit). Upgrading the amount of RAM to 4 GB still makes the programs crash, so I guess it’s rather software related.

I’d like to check the quality of my alignments pre screening/filtering (thus the largest files), since I noticed some bad alignments in the filtered (which will inflate the number of OTUs, I guess). So what do you people use to open and check(/adjust) this amount of sequences?

Thanks

ARB :slight_smile:

We also have a command - align.check - that will tell you how well the 16S rRNA secondary structure is preserved.

Pat

Dear Pat,

I’m not only worrying about the alignments against the secondary structure, but about the general alignment quality.

I often see bases shifted just a couple of positions, clearly out of place (lower case).

-GA-A-CG------
-GA-A-CG------
-GA-G-C--g----
-GA-G-CG------
-GA-A-CG------
-GA-A-CG-G----
-GA-A-CG------
-GA-A-CG------
-GA-A-CG-A----
-GC-Acggtctg--
-GA-G-CG------
-GC-G-CG-C----
-GA-G-GG------
-aa---GG-G----
-GT-G-GG------

So, perhaps these things can be corrected manually in nice conserved blocks, where such aberrations are easily seen, but this seems quite impossible in more variable regions, with the bases scattered over a lot of columns, no? How do you handle this?
This will have an impact on the screening, filtering, chimera detection (or is this (Uchime) independent of the alignment?) and number of OTUs, right?

I’ve tried ARB to align my sequences, with little or no improvement.
Altering parameters probably will never eliminate all of these mistakes?
Wouldn’t a “de novo” approach be better to calculate the number of OTUs?

I don’t want to nag, you know, just wondering how somebody else deals with this or if I’m misinterpreting some things here … :slight_smile:

Kind regards.

ps: bug alert: recalling the reference database after saving it doesn’t seem to work. When realigning it has to be read in again, and is added to the amount of memory used. (or could it be because the save parameter is still set to true?)

Hi Kirk,

Yeah we’ve seen a few cases of this and in general you can go in and manually correct the mis-aligned portions in the reference alignment and then re-align. The problem with de novo is that they pretty much leave out any reference to the secondary structure. Doing things like uclust and esprit over-estimate the similarity between sequences because they aren’t forced to maintain homologous sites across multiple sequences.

Pat

The thing is, scanning through the reference (silva.bacteria.fasta) with ARB, I don’t see any misalignments in those places.

I took a subsample and checked the misaligned sequences, and the mistakes seemed consistent (same faults for same genus), so I was quite confident that it wouldn’t have much impact.

However, I’ve just aligned some Flavobacteria, of which some have identical sequences, and also here I see misalignments, even among the identical ones. Strange.