Warning: 4426786 of your sequences generated alignments that eliminated too many bases

Hi Pat,

Posting it here as a new thread.
And yes indeed, I have removed the primers while making the silva.V3_V4 file.

I have a similar question with regard to too many bases being eliminated. I am working with gut samples sequenced at the V3-V4 region. On aligning these sequences to the Silva.v3_4 file, I receive a similar warning message.

This is the output from my log file. I am running Mothur version 1.48.0, Linux version on the HPC as a batch file. Since the flip=T is already set as default in the new version, I have not specified it in the command.

mothur > align.seqs(fasta=current, reference=Silva.v3_4.fasta)
Using /scratch/leuven/341/vsc34148/V3_V4/V3_V4.trim.contigs.good.unique.fasta as input file for the fasta parameter.

Using 36 processors.

Reading in the /scratch/leuven/341/vsc34148/V3_V4/Silva.v3_4.fasta template sequences… DONE.
It took 61 to read 213119 sequences.

Aligning sequences from /scratch/leuven/341/vsc34148/V3_V4/V3_V4.trim.contigs.good.unique.fasta …
It took 38084 secs to align 15860288 sequences.
[WARNING]: 4426786 of your sequences generated alignments that eliminated too many bases, a list is provided in /scratch/leuven/341/vsc34148/V3_V4/V3_V4.trim.contigs.good.unique.flip.accno$
[NOTE]: 4388639 of your sequences were reversed to produce a better alignment.

It took 38084 seconds to align 15860288 sequences.

Output File Names:
/scratch/leuven/341/vsc34148/V3_V4/V3_V4.trim.contigs.good.unique.align
/scratch/leuven/341/vsc34148/V3_V4/V3_V4.trim.contigs.good.unique.align_report
/scratch/leuven/341/vsc34148/V3_V4/V3_V4.trim.contigs.good.unique.flip.accnos

mothur > summary.seqs(fasta=current, count=current)
Using /scratch/leuven/341/vsc34148/V3_V4/V3_V4.trim.contigs.good.count_table as input file for the count parameter.
Using /scratch/leuven/341/vsc34148/V3_V4/V3_V4.trim.contigs.good.unique.align as input file for the fasta parameter.

Using 36 processors.

            Start   End     NBases  Ambigs  Polymer NumSeqs

Minimum: 0 0 0 0 1 1
2.5%-tile: 1 18983 440 0 4 637607
25%-tile: 1 18983 458 0 5 6376061
Median: 1 18985 465 0 5 12752121
75%-tile: 55 18985 469 0 6 19128181
97.5%-tile: 55 18985 469 0 6 24866635
Maximum: 18985 18985 480 0 8 25504240
Mean: 62 18968 459 0 5

of unique seqs: 15860288

total # of seqs: 25504240

It took 646 secs to summarize 25504240 sequences.

Output File Names:
/scratch/leuven/341/vsc34148/V3_V4/V3_V4.trim.contigs.good.unique.summary

Any suggestions would be very helpful!

Many thanks,
Aditi

I suspect you can ignore the warning. It’s probably saying that it would remove too many bases if it used the aligned sequence without flipping. It looks like all of your sequences needed to be flipped. You could probably include flip=T in make.contigs to avoid this warning message.

In anticipation, you probably will want to check out this blog post…

Pat

1 Like

Thank you Pat!
Fortunately/Unfortunately, being aware of this blog post, I expect a very large distance matrix.
Hopefully by reducing my cut-off value, I am able to obtain a manageable size of the dist file. Fingers crossed!