Hello,
I had 5520209 sequences after make.contigs but left with only 336589 sequences after “screen.seqs”. I expect sequence length between 370-376 bp. I have pasted the summary after each step.
make.contigs(file=stability.files, processors=4)
mothur > summary.seqs(fasta=current)
Using stability.trim.contigs.trim.fasta as input file for the fasta parameter.
Using 1 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 51 51 0 2 1
2.5%-tile: 1 120 120 0 3 138006
25%-tile: 1 214 214 1 4 1380053
Median: 1 274 274 2 4 2760105
75%-tile: 1 332 332 10 5 4140157
97.5%-tile: 1 411 411 31 6 5382204
Maximum: 1 498 498 57 249 5520209
Mean: 1 271.079 271.079 6.37958 4.50889
of Seqs: 5520209
trim.seqs(fasta=stability.trim.contigs.fasta, oligos=primer.oligos, pdiffs=2, flip=T) mothur > summary.seqs(fasta=current) Using stability.trim.contigs.trim.fasta as input file for the fasta parameter.
Using 1 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 15 15 0 2 1
2.5%-tile: 1 93 93 0 4 75324
25%-tile: 1 189 189 0 4 753238
Median: 1 253 253 3 4 1506475
75%-tile: 1 306 306 8 5 2259712
97.5%-tile: 1 373 373 23 6 2937626
Maximum: 1 459 459 57 85 3012949
Mean: 1 245.66 245.66 5.19473 4.53316
of Seqs: 3012949
mothur > screen.seqs(fasta=stability.trim.contigs.trim.fasta, group=stability.contigs.pick.groups, minlength=370) mothur > summary.seqs(fasta=current) Using stability.trim.contigs.trim.good.fasta as input file for the fasta parameter.
Using 1 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 370 370 0 3 1
2.5%-tile: 1 370 370 0 4 8415
25%-tile: 1 371 371 0 4 84148
Median: 1 372 372 0 5 168295
75%-tile: 1 373 373 0 5 252442
97.5%-tile: 1 376 376 0 6 328175
Maximum: 1 459 459 19 10 336589
Mean: 1 372.151 372.151 0.00623609 4.71698
of Seqs: 336589
This is evident that there are many sequences shorter than 370 bp. My question is what minimum length I can choose or consider for better results? What can be "acceptable" limit?
Thanks for help in advance.
Richa