Warning Some of your sequences generated alignments that eliminated too many bases

mothur > align.seqs(fasta=stability.trim.contigs.good.unique.fasta, reference=silva.bacteria.pcr.fasta)

produced the following message:
[WARNING]: Some of your sequences generated alignments that eliminated too many bases, a list is provided in …flip.accnos. If you set the flip parameter to true mothur will try aligning the reverse compliment as well.

The flip accnos file has 32794 entries

I re-ran the same files and added flip=t

I got the same result and the new flip.accnos has the same 32794 entries in it. I went ahead and did

mothur > summary.seqs(fasta=stability.trim.contigs.good.unique.align, count=stability.trim.contigs.good.count_table)

And got

Start End Nbases Ambigs Polymer NumSeqs
Min 0 0 0 0 1 1
2.5% -tile 2 17012 42 0 4 100276
25%q-tile 2 17012 425 0 4 1002754
Median 2 17012 425 0 4 2005507
75%q-tile 2 17012 425 0 4 3008260
97.5%q-tile 16047 17012 425 0 6 3910737
Max 17012 17012 431 0 142 4011012
Mean 555.096 16926.3 405.843 0 4.23103

unique seq 1315681

#seqs 4011012

I don’t know if this looks ok or not. Do I just ignore the 32794 sequences that were unalignable? Reverse complementing didn’t solve the problem. Is this normal? (As an aside, in interpreting the above table, in the max row, how can sequences start at 17012, end at 17012, yet have 431 Nbases?)

That does seem high. What are you trying to align - sample, region, chemistry, etc?

For interpreting the table, each column is separate. So you have sequences that starts as late as 17012, ends as late as 17012, and a sequence that is at most 431 bases. Those may or may not be the same sequence. You can look at the output summary file to get a sense of those sequences. I suspect there is a sequence that starts and ends at 17012 that is 1nt long.

Pat

Hi Dr. Schloss,
Thank you so much for taking the time to help. I have 4 sets of samples. A different primer set was used for each sample. All samples are bacterial communities from 10 different deep sea mussels. The current set I am analyzing were run with the standard Miseq Illumina primers for their 16S protocol. So this set of samples is 10 different mussels, Illumina primers targeting V3 and V4 region of 16S. So am I right to interpret that out of 4011012 sequences, 32974 were unaligned, or does the 4011012 exclude the unaligned? If you were a reviewer and saw this result, would you assume there is something wrong with the way I’ve done the analysis?

That’s less than 1% of your sequences. That doesn’t sound too bad. You might individually blast some of those sequences and see what they are.

Hi,

I have a similar question with regard to too many bases being eliminated. I am working with gut samples sequenced at the V3-V4 region. On aligning these sequences to the Silva.v3_4 file, I receive a similar warning message.

This is the output from my log file. I am running Mothur version 1.48.0, Linux version on the HPC as a batch file. Since the flip=T is already set as default in the new version, I have not specified it in the command.

mothur > align.seqs(fasta=current, reference=Silva.v3_4.fasta)
Using /scratch/leuven/341/vsc34148/V3_V4/V3_V4.trim.contigs.good.unique.fasta as input file for the fasta parameter.

Using 36 processors.

Reading in the /scratch/leuven/341/vsc34148/V3_V4/Silva.v3_4.fasta template sequences… DONE.
It took 61 to read 213119 sequences.

Aligning sequences from /scratch/leuven/341/vsc34148/V3_V4/V3_V4.trim.contigs.good.unique.fasta …
It took 38084 secs to align 15860288 sequences.
[WARNING]: 4426786 of your sequences generated alignments that eliminated too many bases, a list is provided in /scratch/leuven/341/vsc34148/V3_V4/V3_V4.trim.contigs.good.unique.flip.accno$
[NOTE]: 4388639 of your sequences were reversed to produce a better alignment.

It took 38084 seconds to align 15860288 sequences.

Output File Names:
/scratch/leuven/341/vsc34148/V3_V4/V3_V4.trim.contigs.good.unique.align
/scratch/leuven/341/vsc34148/V3_V4/V3_V4.trim.contigs.good.unique.align_report
/scratch/leuven/341/vsc34148/V3_V4/V3_V4.trim.contigs.good.unique.flip.accnos

mothur > summary.seqs(fasta=current, count=current)
Using /scratch/leuven/341/vsc34148/V3_V4/V3_V4.trim.contigs.good.count_table as input file for the count parameter.
Using /scratch/leuven/341/vsc34148/V3_V4/V3_V4.trim.contigs.good.unique.align as input file for the fasta parameter.

Using 36 processors.

            Start   End     NBases  Ambigs  Polymer NumSeqs

Minimum: 0 0 0 0 1 1
2.5%-tile: 1 18983 440 0 4 637607
25%-tile: 1 18983 458 0 5 6376061
Median: 1 18985 465 0 5 12752121
75%-tile: 55 18985 469 0 6 19128181
97.5%-tile: 55 18985 469 0 6 24866635
Maximum: 18985 18985 480 0 8 25504240
Mean: 62 18968 459 0 5

of unique seqs: 15860288

total # of seqs: 25504240

It took 646 secs to summarize 25504240 sequences.

Output File Names:
/scratch/leuven/341/vsc34148/V3_V4/V3_V4.trim.contigs.good.unique.summary

Any suggestions would be very helpful!

Many thanks,
Aditi

Can you post this on its own thread/ The original thread is 6 years old and mothur has changed a lot in the mean time.

At first glance, I’d wonder whether you truly have V3-V4 sequences or if you didn’t remove the primers in make.contigs but you did when making Silva.v3_4.fasta

Pat