Hi, I am experiencing some problems with filter.seqs.
My reads is 454 reads of 16S v1-3 regions, around 500 bp long.
I analyzed the data according to 454 mothur SOP.
However, after the filter.seqs command mothur > filter.seqs(fasta=trim.unique.good.align,vertical=T,trump=.,processors=2)
it gave a description like this: Length of filtered alignment: 8426
Number of columns removed: 4391
Length of the original alignment: 12817
Number of sequences used to construct filter: 416927
then I run mothur > summary.seqs(fasta=trim.good.filter.fasta)
it showed like this:
_Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 8426 216 0 3 1
2.5%-tile: 1 8426 230 0 3 10424
25%-tile: 1 8426 249 0 4 104232
Median: 1 8426 250 0 5 208464
75%-tile: 1 8426 255 0 5 312696
97.5%-tile: 1 8426 258 0 7 406504
Maximum: 3 8426 285 0 31 416927
Mean: 1.0001 8426 248.723 0 4.67198
of Seqs: 416927_
it seems all the reads were only half of the original length.
Actually, before the filter.seqs step, the reads length is normal.
Is there anything wrong with my command settings?
What does the summary.seqs output look like before you run screen.seqs? If a few short sequences are getting through your screening step then they’ll reduce the number of base positions preserved during filtering.
total # of seqs: 449861_
the length of reads seems good, median 490bp.
After screen.seqs, the read length also is around 494 bp.
_mothur > summary.seqs(fasta=trim.unique.good.align, name=trim.unique.good.names)
I think your problem is that after you screen the sequences you still have a minimum sequence length of 227 bp in your data set. Since filtering only retains alignment positions common to all your sequences every sequence in your data set is effectively trimmed to the length of the shortest sequence.
Looking at the output of your summary.seqs, 75% of your sequences are >440 bp long so if you’re willing to lose ~25% of your reads you’ll be able to double the read length of your retained data.