problems with filter.seqs

Yanfei · March 25, 2015, 7:12pm

Hi, I am experiencing some problems with filter.seqs.
My reads is 454 reads of 16S v1-3 regions, around 500 bp long.
I analyzed the data according to 454 mothur SOP.
However, after the filter.seqs command
mothur > filter.seqs(fasta=trim.unique.good.align,vertical=T,trump=.,processors=2)
it gave a description like this:
Length of filtered alignment: 8426
Number of columns removed: 4391
Length of the original alignment: 12817
Number of sequences used to construct filter: 416927
then I run mothur > summary.seqs(fasta=trim.good.filter.fasta)
it showed like this:
_Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 8426 216 0 3 1
2.5%-tile: 1 8426 230 0 3 10424
25%-tile: 1 8426 249 0 4 104232
Median: 1 8426 250 0 5 208464
75%-tile: 1 8426 255 0 5 312696
97.5%-tile: 1 8426 258 0 7 406504
Maximum: 3 8426 285 0 31 416927
Mean: 1.0001 8426 248.723 0 4.67198

of Seqs: 416927_

it seems all the reads were only half of the original length.

Actually, before the filter.seqs step, the reads length is normal.
Is there anything wrong with my command settings?

dwaite · March 26, 2015, 4:00am

What does the summary.seqs output look like before you run screen.seqs? If a few short sequences are getting through your screening step then they’ll reduce the number of base positions preserved during filtering.

Yanfei · March 26, 2015, 2:09pm

this is how the reads look before screen.seqs
_mothur > summary.seqs(fasta=trim.unique.align,name=trim.unique.names)

Using 8 processors.

Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 4 2 0 1 1
2.5%-tile: 2 12817 210 0 4 11247
25%-tile: 2 12817 440 0 5 112466
Median: 14 12817 490 0 5 224931
75%-tile: 697 12817 505 0 5 337396
97.5%-tile: 5071 12817 515 0 7 438615
Maximum: 12815 12817 570 0 31 449861
Mean: 852.927 12769.2 447.885 0 4.98815

of unique seqs: 449861

total # of seqs: 449861_
the length of reads seems good, median 490bp.
After screen.seqs, the read length also is around 494 bp.
_mothur > summary.seqs(fasta=trim.unique.good.align, name=trim.unique.good.names)

Using 8 processors.

Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 12817 227 0 3 1
2.5%-tile: 2 12817 292 0 4 10424
25%-tile: 2 12817 450 0 5 104232
Median: 9 12817 494 0 5 208464
75%-tile: 167 12817 506 0 5 312696
97.5%-tile: 4235 12817 515 0 7 406504
Maximum: 4392 12817 570 0 31 416927
Mean: 542.95 12817 464.192 0 5.06174

of unique seqs: 416927

total # of seqs: 416927_
So I think filter.seqs is where the problem is. But I am not sure where should be corrected.

dwaite · March 26, 2015, 7:03pm

I think your problem is that after you screen the sequences you still have a minimum sequence length of 227 bp in your data set. Since filtering only retains alignment positions common to all your sequences every sequence in your data set is effectively trimmed to the length of the shortest sequence.

Looking at the output of your summary.seqs, 75% of your sequences are >440 bp long so if you’re willing to lose ~25% of your reads you’ll be able to double the read length of your retained data.

Topic		Replies	Views
Problem with filter.seqs - Length of filtered alignment: 0 Commands in mothur	4	338	June 22, 2023
filter.seqs removes all data Commands in mothur	10	6749	January 25, 2016
filter.seqs Commands in mothur	4	3922	May 31, 2012
filter.seqs - potential bug? mothur bugs	7	8851	December 1, 2014
Length of filtered alignment: 0 Commands in mothur	2	571	March 12, 2020

problems with filter.seqs

of Seqs: 416927_

of unique seqs: 449861

of unique seqs: 416927

Related topics