Too many bases eliminated after alignment

Claudia · May 8, 2012, 3:45pm

Dear all,

When analyzing my 454 data (16S), after using either of these commands:

mothur > align.seqs(fasta=crdd1.shhh.trim.unique.fasta, reference=silva.bacteria.fasta, processors=2)
mothur > align.seqs(fasta=crdd1.shhh.trim.unique.fasta, reference=silva.bacteria.fasta, flip=T, processors=2)

I get the following messages:

Some of you sequences generated alignments that eliminated too many bases, a list is provided in crdd1.shhh.trim.unique.flip.accnos. If you set the flip parameter to true mothur will try aligning the reverse compliment as well. It took 186 secs to align 25814 sequences.
Some of you sequences generated alignments that eliminated too many bases, a list is provided in crdd1.shhh.trim.unique.flip.accnos. If the reverse compliment proved to be better it was reported. It took 378 secs to align 25814 sequences.

My question is: Should I modify the command line somehow to avoid eliminating too many bases? Or It is acceptable to continue with the downstream analysis using the files generated by one of the commands I used? even though it eliminated a considerable number of bases.

Thanks

pschloss · May 14, 2012, 12:24pm

What do the output look like? How many sequences show up in the flip.accnos file? Sometimes spurious DNA fragments get through to this point and because they aren’t 16S sequences, weird things happen. If there aren’t a ton of them and you’re sure the sequences are in the right direction, I wouldn’t worry. They’ll get culled when you run screen.seqs.

Pat

dackfc · June 14, 2012, 10:19am

Rather than start a new thread I figure this is an appropriate topic for me to post about a similar problem.

When we run align.seqs and then get the summary, we are left with very few sequences of which the smallest is 0bp, largest is 60bp, and median is 13bp. Prior to this command we have no problems with large numbers of sequences all 200-350 bp. When I view the accnos file there are many sequence names in the file but what concerns us most is the fasta file generated in align.seqs consists mostly of “.” with the odd nucliotide base. For example…

…A… …and so on…

We have tried many approaches to the align.seqs to overcome this but have been unsuccessful. Is there something we are doing wrong or is this because the data we get back from the lab is poor?

Any help is greatly appriciated.

thanks,
Chris

pschloss · June 15, 2012, 5:23am

This may be happening because in trim.seqs you used flip=T when you sequenced from left to right. If you sequence from left to right (i.e. from the 5’ to 3’ end of the gene) then you don’t want to use flip=T.

Claudia · July 12, 2012, 10:14pm

Thanks Pat, there were just 4 sequences in the accnos file out of a total of 90944 sequences. I guess I am fine, similar thing happens when analyzing 18S datasets.

Topic		Replies	Views
Warning Some of your sequences generated alignments that eliminated too many bases Commands in mothur	5	1756	December 1, 2022
align.seqs_output file_ unique.flip.accnos Commands in mothur	1	3202	April 15, 2014
align.seqs deletes almost all bases Commands in mothur	4	1915	June 16, 2015
Align.seqs removing most basepairs Commands in mothur	1	2064	August 15, 2013
align.seqs and no of bases Commands in mothur	5	2986	January 16, 2015

Too many bases eliminated after alignment

Related topics