Issues with align.seqs: Eliminated bases warning and failed screen.seqs

fleal174 · December 11, 2024, 10:11pm

Hi there, I am quite new to Mothur and have kept coming up with the same issue in my dataset so am looking for some advice.

I have a lage dataset (62) of 16s rRNA samples from the V3-V4 region with the following primers:
341F Primer CCTACGGGNGGCWGCAG
806R Primer GGACTACNVGGGTWTCTAAT

The problem comes when I get to align.seqs where I get this warning:

It took 6298 secs to align 20287188 sequences.
[WARNING]: 17744737 of your sequences generated alignments that eliminated too many bases, a list is provided in Astability.trim.contigs.good.unique.flip.accnos. [NOTE]: 10000155 of your sequences were reversed to produce a better alignment. It took 6299 seconds to align 20287188 sequences.

This is the summary from the files generated:

Then screen.seqs fails:

	It took 4751 secs to screen 20287188 sequences, removed 20287188.

	/******************************************/
	Running command: remove.seqs(accnos=Astability.trim.contigs.good.unique.bad.accnos.temp, count=Astability.trim.contigs.good.count_table)
	Removed 27204442 sequences from Astability.trim.contigs.good.count_table.
	[WARNING]: Astability.trim.contigs.good.count_table contains only sequences from the .accnos file.

	Output File Names:
	Astability.trim.contigs.good.pick.count_table

	/******************************************/

	Output File Names:
	Astability.trim.contigs.good.unique.good.align
	Astability.trim.contigs.good.unique.bad.accnos
	Astability.trim.contigs.good.good.count_table


	It took 13904 secs to screen 20287188 sequences.

I have tried a number of things to get this to stop happening. I’ve double checked my silva reference and coordinates, I’ve also run Trimmomatic on the raw fastq sequences and then used these to follow the pipeline to align.seqs again. I will try using the files prior to align seqs to continue the pipeline, but would really like to know what I’ve done incorrectly.

pschloss · December 12, 2024, 3:49pm

Hi there,

What are you using to align your sequences? The quality of the alignment or the sequences looks pretty bad. Not many of the sequences are the right length. More than half are 14 or so nucleotides long.

Can you clarify what reference file you are using and what the commands and specific syntax you are running upstream of screen.seqs?

Also, I generally discourage people from sequencing with the 2x300 chemistry as well as the v3-V4 region.

Pat

fleal174 · December 12, 2024, 9:53pm

Thanks for your help Pat,

The reference I am using is: silva.seed_v138_2.align. These steps are according to this blog: Customize your reference alignment for your favorite region

I downloaded an ecoli fasta from NCBI and trimmed this to my primers and saved it as FINALecoli.fasta. I then aligned this to the silva seed reference:

Mothur > align.seqs(fasta= FINALecoli.fasta , reference= silva.seed_v138_2.align)

And then made a summary: summary.seqs(fasta= FINALecoli.align )

Next, i trimmed the silva reference to the coordinates

mothur > pcr.seqs(fasta=silva.seed_v138_2.align, start=6388, end=25318, keepdots=FALSE)

Output files: silva.seed_v138_2.pcr.align. Renamed file to SILVA.fasta

Finally I aligned my fasta to the reference:

mothur > align.seqs(fasta=stability.trim.contigs.good.unique.fasta, reference=SILVA.fasta)

And then I did screen.seqs as in my first post.

Apologies if I have left out any information, I am still new to using mothur. But please do let me know if you have any other questions, I appreciate the help!

Ally

pschloss · December 12, 2024, 11:20pm

Could you try aligning it to Silva seed without customizing the region and post the output of summary.seqs?

fleal174 · December 15, 2024, 7:48am

Thanks again,
I’ve aligned my sequences to the reference silva seed without customising:

mothur > align.seqs(fasta=Astability.trim.contigs.good.unique.fasta, reference=silva.seed_v138_2.align)

Again this resulted in a similar warning of "17612804 of your sequences generated alignments that’s eliminated too many bases… **
****9841274 of your sequences were reversed…"

Here is the summary:
Mothur > summary.seqs(fasta=Astability.trim.contigs.good.unique.align, count=Astability.trim.contigs.good.count_table)

And I’ve just ran: screen.seqs(fasta=Astability.trim.contigs.good.unique.align,count=Astability.trim.contigs.good.count_table, start=1000, end=40000)

Which gave this:
It took 10411 secs to screen 20287188 sequences, removed 20287188.

Running command: remove.seqs(accnos=Astability.trim.contigs.good.unique.bad.accnos.temp, count=Astability.trim.contigs.good.count_table)

So the screening removed all of the sequences. Would there be much point to me redoing the screen.seqs with a wider interval? It has been taking about 5 hours to run each command so it’s difficult to tinker about. Any advice is welcomed.

Thanks very much,
Ally

fleal174 · December 15, 2024, 8:21pm

Update:

Following from screen.seqs(fasta=Astability.trim.contigs.good.unique.align,count=Astability.trim.contigs.good.count_table, start=1000, end=40000)

It took 10411 secs to screen 20287188 sequences, removed 20287188.
Output File Names:
Astability.trim.contigs.good.unique.good.align -EMPTY
Astability.trim.contigs.good.unique.bad.accnos
Astability.trim.contigs.good.good.count_table -doesn’t exist

I then tried to run:
Mothur > filter.seqs(fasta=Astability.trim.contigs.good.unique.align, vertical=T, trump=.)
(Using the output file from align.seqs since the empty one from screen.seqs couldn’t be used)

However it removed all columns from the alignment. So I am unsure what to do about that.

Ally

shot89_1000 · December 16, 2024, 10:32am

So you are sequencing the v3-v4 region, of approx 460 pb. As Pat said, you may have a problem in the alignment (with the make.contigs command)… In summary, you are probably sequencing a chunk of 460 pb with a pair end approach that sequence 250 on each end. It’s cutting it very close, and… well, Pat explains it very well on his post, so please check Why do I have such a large distance matrix
If you review your summary table, you have 680k sequences (of your 27M) that are above 400pb (I’m guessing you have even less with >460pb…) Anyway, all of them start at around 43000 and ends in 43116, so if you delete all outside the range 1000 and 40000 you are basically deleting everything that may be on the correct size.

pschloss · December 16, 2024, 1:48pm

Hi again,

I think you’re seeing exactly what we saw in Table 2 of Kozich with this region. To start, you have very little overlap between the two reads (~75 nt) and if I had to guess you had a bad sequence run. After assembling the reads with make.contigs you then probably removed a lot of sequences that had ambiguous bases, right?

Looking at your most recent output from summary.seqs, more than 25% of your sequences end at the beginning 5’ end of the gene (see end values <= 1096; the bases start at 1044) and more than 25% of your sequences start at the end of the 3’ end of the gene (see start values >=43061). I suspect there’s more than 50% of your reads that fall in this area.

You found that the V3-V4 region started at 6388 and ended at 25318. Those values seem right. Unfortunately, none of those values show up in the output from summary.seqs - that doesn’t mean they aren’t there, but there aren’t many of them, which supports what you’re seeing. When you run screen.seqs with the full alignment database you should be using start=6388 and end=25318. The start value is the point in the alignment where sequences start at or before and end is the point in the alignment end at or after.

I wonder if you actually sequenced what you think you did. I’m curious what happens when you take one of your unaligned sequences and run it through blast. Does it come back as a 16S? I have a suspicion that it might be PhiX (or someone else’s project).

Pat

Topic		Replies	Views
WARNING message	6	811	May 5, 2020
screen.seqs removes all basepairs from certain sequences Commands in mothur	1	1441	November 16, 2015
Alignment issues	4	412	September 5, 2023
Too many bases eliminated after alignment Commands in mothur	4	8631	July 12, 2012
Warning Some of your sequences generated alignments that eliminated too many bases Commands in mothur	5	1714	December 1, 2022

Issues with align.seqs: Eliminated bases warning and failed screen.seqs

Related topics