Hi There,
I have data from 16S from 28F-519R covering region V1 to V3.
I have trimmed the data using :
mothur > trim.seqs(fasta=X.fna, oligos=X.oligos, qfile=X.qual, maxambig=0, maxhomop=8, minlength=200, qaverage=25)
and I have aligned using silva
mothur > align.seqs(fasta=X.trim.rename.unique.fasta, reference=silva.bacteria.fasta, flip=t)
I had quite a few seqeunces(11,000) in the flip.accnos from one of my analysis… I kept going do…and I get
In one case I get :
mothur > summary.seqs(fasta=16S_ads_all.trim.rename.unique.align)
Using 1 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 0 0 0 0 1 1
2.5%-tile: 1044 1109 12 0 2 1442
25%-tile: 1044 8365 285 0 4 14417
Median: 1044 9916 399 0 5 28833
75%-tile: 1044 13130 470 0 5 43249
97.5%-tile:43061 43116 501 0 6 5 6223
Maximum: 43116 43117 535 0 8 5 7664
Mean: 7144.03 14110.3 336.496 0 4.58171
of Seqs: 57664
In an other case I get:
mothur > summary.seqs(fasta=16S_juvs_all.trim.rename.unique.align)
Using 1 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1044 1046 2 0 1 1
2.5%-tile: 1044 5711 246 0 4 1638
25%-tile: 1044 8411 370 0 5 16377
Median: 1044 9964 406 0 5 32753
75%-tile: 1044 11888 446 0 5 49129
97.5%-tile:1044 13862 493 0 6 63868
Maximum: 43112 43116 531 0 8 65505
Mean: 1178.13 10195.5 397.359 0 4.89766
of Seqs: 65505
It seems like they all start at the position 1044 ...so I guess it would be a good point for screening...But what I do not undrestand is why I have seqeunces with 2bp and 12bp when trimming I asked to get a minlength of 200bp...
So this is what I did:
mothur > screen.seqs(fasta=16S_ads_all.trim.rename.unique.align, name=16S_ads_all.trim.rename.names, minlength=200, start=1044)
and I get this :
othur > summary.seqs()
Using 16S_ads_all.trim.rename.unique.good.align as input file for the fasta parameter.
Using 1 processors.
[WARNING]: This command can take a namefile and you did not provide one. The current namefile is 16S_ads_all.trim.rename.good.names which seems to match 16S_ads_all.trim.rename.unique.good.align.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1044 3603 200 0 3 1
2.5%-tile: 1044 5714 257 0 4 1108
25%-tile: 1044 8403 383 0 5 11079
Median: 1044 9870 418 0 5 22157
75%-tile: 1044 11885 482 0 5 33235
97.5%-tile: 1044 13862 502 0 6 43205
Maximum: 1044 14979 529 0 8 44312
Mean: 1044 9767.72 416.821 0 5.04306
of Seqs: 44312
Output File Name:
16S_ads_all.trim.rename.unique.good.summary
…seems good…Is this correct or is there more to do?
Thanks!