understanding screen.seqs?

Hi There,
I have data from 16S from 28F-519R covering region V1 to V3.
I have trimmed the data using :
mothur > trim.seqs(fasta=X.fna, oligos=X.oligos, qfile=X.qual, maxambig=0, maxhomop=8, minlength=200, qaverage=25)
and I have aligned using silva

mothur > align.seqs(fasta=X.trim.rename.unique.fasta, reference=silva.bacteria.fasta, flip=t)
I had quite a few seqeunces(11,000) in the flip.accnos from one of my analysis… I kept going do…and I get

In one case I get :
mothur > summary.seqs(fasta=16S_ads_all.trim.rename.unique.align)


Using 1 processors.

Start End NBases Ambigs Polymer NumSeqs
Minimum: 0 0 0 0 1 1
2.5%-tile: 1044 1109 12 0 2 1442
25%-tile: 1044 8365 285 0 4 14417
Median: 1044 9916 399 0 5 28833
75%-tile: 1044 13130 470 0 5 43249
97.5%-tile:43061 43116 501 0 6 5 6223
Maximum: 43116 43117 535 0 8 5 7664
Mean: 7144.03 14110.3 336.496 0 4.58171

of Seqs: 57664

In an other case I get:

mothur > summary.seqs(fasta=16S_juvs_all.trim.rename.unique.align)


Using 1 processors.

Start End NBases Ambigs Polymer NumSeqs
Minimum: 1044 1046 2 0 1 1
2.5%-tile: 1044 5711 246 0 4 1638
25%-tile: 1044 8411 370 0 5 16377
Median: 1044 9964 406 0 5 32753
75%-tile: 1044 11888 446 0 5 49129
97.5%-tile:1044 13862 493 0 6 63868
Maximum: 43112 43116 531 0 8 65505
Mean: 1178.13 10195.5 397.359 0 4.89766

of Seqs: 65505


It seems like they all start at the position 1044 ...so I guess it would be a good point for screening...But what I do not undrestand is why I have seqeunces with 2bp and 12bp when trimming I asked to get a minlength of 200bp...

So this is what I did:

mothur > screen.seqs(fasta=16S_ads_all.trim.rename.unique.align, name=16S_ads_all.trim.rename.names, minlength=200, start=1044)

and I get this :


othur > summary.seqs()

Using 16S_ads_all.trim.rename.unique.good.align as input file for the fasta parameter.

Using 1 processors.
[WARNING]: This command can take a namefile and you did not provide one. The current namefile is 16S_ads_all.trim.rename.good.names which seems to match 16S_ads_all.trim.rename.unique.good.align.

Start End NBases Ambigs Polymer NumSeqs
Minimum: 1044 3603 200 0 3 1
2.5%-tile: 1044 5714 257 0 4 1108
25%-tile: 1044 8403 383 0 5 11079
Median: 1044 9870 418 0 5 22157
75%-tile: 1044 11885 482 0 5 33235
97.5%-tile: 1044 13862 502 0 6 43205
Maximum: 1044 14979 529 0 8 44312
Mean: 1044 9767.72 416.821 0 5.04306

of Seqs: 44312

Output File Name:
16S_ads_all.trim.rename.unique.good.summary

…seems good…Is this correct or is there more to do?

Thanks!

Everything looks correct. You’ll see sequences start/end at 1044 or 43117 when a fragment gets in that isn’t really a 16S sequence. Because it doesn’t look at all like something in the database it gets truncated. It turns out to be a nice way to get rid of garbage.