filter.seqs

Hi there,
…when I run :
mothur > filter.seqs(fasta=Lema_16S_adults.trim2.unique.good.align, vertical=T, trump=. )

I end up with sequences of 4 base pairs!!!..

Length of filtered alignment: 4
Number of columns removed: 49996
Length of the original alignment: 50000
Number of sequences used to construct filter: 13104

So I run:
mothur > filter.seqs(fasta=Lema_16S_adults.trim2.unique.good.align)
and I get my sequences again…what is happening?!
I decided to use the later one…is this ok?!

Length of filtered alignment: 1178
Number of columns removed: 48822
Length of the original alignment: 50000
Number of sequences used to construct filter: 13104

Thanks!
kim

How are you running screen.seqs? Can you send us the output from summary.seqs using the fasta and name file you are giving screen.seqs?

Hi Pat,
thanks,
So basically I when I align my sequences I get:

mothur > align.seqs(fasta=Lema_16S_adults.trim2.unique.fasta, reference=silva.bacteria.fasta, flip=t)


Start End NBases Ambigs Polymer NumSeqs Minimum: 1044 1048 1 0 1 1 2.5%-tile: 1044 3857 30 0 3 351 25%-tile: 1044 6389 279 0 4 3507 Median: 1044 8419 372 0 5 7014 75%-tile: 1044 10303 433 0 5 10520 97.5%-tile:43007 43116 493 0 5 13676 Maximum: 43116 43116 500 0 5 14026 Mean: 3185.6 10294.7 341.795 0 4.67275 # of Seqs: 14026

so I screen like:
mothur > screen.seqs(fasta=Lema_16S_adults.trim2.unique.align, name=Lema_16S_adults.trim2.names, start=1044)
mothur > summary.seqs()


Start End NBases Ambigs Polymer NumSeqs Minimum: 1044 1048 3 0 1 1 2.5%-tile: 1044 3855 155 0 4 328 25%-tile: 1044 6333 297 0 5 3277 Median: 1044 8411 379 0 5 6553 75%-tile: 1044 10261 437 0 5 9829 97.5%-tile:1044 13862 493 0 5 12777 Maximum: 1044 14965 500 0 5 13104 Mean: 1044 8525 358.279 0 4.76389 # of Seqs: 13104
Basically then I had the problem when I do the dist.seqs and the cluster ()..... I ended up making a Phylip distance matrix and the cluster worked.....not really understand what is my back mistake in all that... :roll:

Many thanks!!!

So the earliest a sequence ends is at position 1048 - so you’re keeping sequences that run from 1044 to after 1048 - viola a 4 bp alignment. Instead try this…


screen.seqs(fasta=Lema_16S_adults.trim2.unique.align, name=Lema_16S_adults.trim2.names, start=1044, end=6333)
Also, it doesn't look like you're really doing much for quality trimming.

Pat

Thanks Pat :smiley:
Now everything works…!
I wish I could attend your August workshop as I am quite new in this and really like mother!..unfortunately I am in the other part of the world (Australia)…so a bit far !!
Cheers
kim