Warning Some of your sequences generated alignments that eliminated too many bases

mothur > align.seqs(fasta=stability.trim.contigs.good.unique.fasta, reference=silva.bacteria.pcr.fasta)

produced the following message:
[WARNING]: Some of your sequences generated alignments that eliminated too many bases, a list is provided in …flip.accnos. If you set the flip parameter to true mothur will try aligning the reverse compliment as well.

The flip accnos file has 32794 entries

I re-ran the same files and added flip=t

I got the same result and the new flip.accnos has the same 32794 entries in it. I went ahead and did

mothur > summary.seqs(fasta=stability.trim.contigs.good.unique.align, count=stability.trim.contigs.good.count_table)

And got

Start End Nbases Ambigs Polymer NumSeqs
Min 0 0 0 0 1 1
2.5% -tile 2 17012 42 0 4 100276
25%q-tile 2 17012 425 0 4 1002754
Median 2 17012 425 0 4 2005507
75%q-tile 2 17012 425 0 4 3008260
97.5%q-tile 16047 17012 425 0 6 3910737
Max 17012 17012 431 0 142 4011012
Mean 555.096 16926.3 405.843 0 4.23103

unique seq 1315681

#seqs 4011012

I don’t know if this looks ok or not. Do I just ignore the 32794 sequences that were unalignable? Reverse complementing didn’t solve the problem. Is this normal? (As an aside, in interpreting the above table, in the max row, how can sequences start at 17012, end at 17012, yet have 431 Nbases?)

That does seem high. What are you trying to align - sample, region, chemistry, etc?

For interpreting the table, each column is separate. So you have sequences that starts as late as 17012, ends as late as 17012, and a sequence that is at most 431 bases. Those may or may not be the same sequence. You can look at the output summary file to get a sense of those sequences. I suspect there is a sequence that starts and ends at 17012 that is 1nt long.


Hi Dr. Schloss,
Thank you so much for taking the time to help. I have 4 sets of samples. A different primer set was used for each sample. All samples are bacterial communities from 10 different deep sea mussels. The current set I am analyzing were run with the standard Miseq Illumina primers for their 16S protocol. So this set of samples is 10 different mussels, Illumina primers targeting V3 and V4 region of 16S. So am I right to interpret that out of 4011012 sequences, 32974 were unaligned, or does the 4011012 exclude the unaligned? If you were a reviewer and saw this result, would you assume there is something wrong with the way I’ve done the analysis?

That’s less than 1% of your sequences. That doesn’t sound too bad. You might individually blast some of those sequences and see what they are.