More than 90% sequences lost after screening

Kumari_Richa · December 12, 2016, 12:35pm

Hello,

I had 5520209 sequences after make.contigs but left with only 336589 sequences after “screen.seqs”. I expect sequence length between 370-376 bp. I have pasted the summary after each step.

make.contigs(file=stability.files, processors=4)
mothur > summary.seqs(fasta=current)
Using stability.trim.contigs.trim.fasta as input file for the fasta parameter.

Using 1 processors.

Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 51 51 0 2 1
2.5%-tile: 1 120 120 0 3 138006
25%-tile: 1 214 214 1 4 1380053
Median: 1 274 274 2 4 2760105
75%-tile: 1 332 332 10 5 4140157
97.5%-tile: 1 411 411 31 6 5382204
Maximum: 1 498 498 57 249 5520209
Mean: 1 271.079 271.079 6.37958 4.50889

of Seqs: 5520209

trim.seqs(fasta=stability.trim.contigs.fasta, oligos=primer.oligos, pdiffs=2, flip=T) mothur > summary.seqs(fasta=current) Using stability.trim.contigs.trim.fasta as input file for the fasta parameter.

Using 1 processors.

Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 15 15 0 2 1
2.5%-tile: 1 93 93 0 4 75324
25%-tile: 1 189 189 0 4 753238
Median: 1 253 253 3 4 1506475
75%-tile: 1 306 306 8 5 2259712
97.5%-tile: 1 373 373 23 6 2937626
Maximum: 1 459 459 57 85 3012949
Mean: 1 245.66 245.66 5.19473 4.53316

of Seqs: 3012949

mothur > screen.seqs(fasta=stability.trim.contigs.trim.fasta, group=stability.contigs.pick.groups, minlength=370) mothur > summary.seqs(fasta=current) Using stability.trim.contigs.trim.good.fasta as input file for the fasta parameter.

Using 1 processors.

Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 370 370 0 3 1
2.5%-tile: 1 370 370 0 4 8415
25%-tile: 1 371 371 0 4 84148
Median: 1 372 372 0 5 168295
75%-tile: 1 373 373 0 5 252442
97.5%-tile: 1 376 376 0 6 328175
Maximum: 1 459 459 19 10 336589
Mean: 1 372.151 372.151 0.00623609 4.71698

of Seqs: 336589

This is evident that there are many sequences shorter than 370 bp. My question is what minimum length I can choose or consider for better results? What can be "acceptable" limit?

Thanks for help in advance.
Richa

pschloss · December 12, 2016, 1:34pm

Hi,

What are you sequencing and with which chemistry? If it’s a 16S region, then I think there are big problems with the data. The quality of your data is quite poor, which is likely making it very difficult to find matches to your barcodes and primers. If you look at the output from make.contigs, you’ll see that at least 75% of your sequences have an ambiguous base call in them. Furthermore, if you expect sequences that are ~370 nt long, then a range between 51 and 498 is way too broad for any 16S region that I know of.

Pat

Topic		Replies	Views
understanding screen.seqs? Commands in mothur	1	2259	January 29, 2013
What criteria to remove shorter and longer of contigs sequence using screen.seqs Theory behind mothur	4	1505	March 13, 2017
Choosing length after first command line Commands in mothur	4	3019	September 15, 2014
Loss of bases with filter.seqs Commands in mothur	1	2153	February 22, 2012
settings for screen.seqs Commands in mothur	5	3644	October 18, 2012

More than 90% sequences lost after screening

of Seqs: 5520209

of Seqs: 3012949

of Seqs: 336589

Related topics