Screen.seq fail

I am having difficulties figuring out why screen.seqs is completely removing >99% of my reads. As per the SOP, you run screen.seqs to get sequences that start at or before/after the start/end positions. However, instead of keeping these sequences, it completely eliminates them.

I am using mothur v.1.48.1

mothur > 
summary.seqs(fasta=stability.trim.contigs.good.unique.align, count=stability.trim.contigs.good.count_table)

Using 8 processors.

		Start	End	NBases	Ambigs	Polymer	NumSeqs
Minimum:	1	1	1	0	1	1
2.5%-tile:	1	18929	440	0	4	95676
25%-tile:	1	18929	442	0	5	956752
Median: 	1	18929	460	0	5	1913504
75%-tile:	1	18929	465	0	6	2870255
97.5%-tile:	1	18929	466	0	8	3731331
Maximum:	18932	18932	466	0	8	3827006
Mean:	19	18914	454	0	5
# of unique seqs:	2695221
total # of seqs:	3827006

It took 1513 secs to summarize 3827006 sequences.

Output File Names:
stability.trim.contigs.good.unique.summary

mothur > 
screen.seqs(fasta=stability.trim.contigs.good.unique.align, count=stability.trim.contigs.good.count_table, summary=stability.trim.contigs.good.unique.summary,  maxhomop=8, start=1, end=18932)

Using 8 processors.

It took 1125 secs to screen 2695221 sequences, removed 2694249.

/******************************************/
Running command: remove.seqs(accnos=stability.trim.contigs.good.unique.bad.accnos.temp, count=stability.trim.contigs.good.count_table)
Removed 3825869 sequences from stability.trim.contigs.good.count_table.

Output File Names:
stability.trim.contigs.good.pick.count_table

/******************************************/

Output File Names:
stability.trim.contigs.good.unique.good.summary
stability.trim.contigs.good.unique.good.align
stability.trim.contigs.good.unique.bad.accnos
stability.trim.contigs.good.good.count_table


It took 1596 secs to screen 2695221 sequences.

Thank you so much for your help

Thanks for your question. screen.seqs is doing exactly what you told it to :rofl:

You used start=1, end=18932 as arguments. This means that each sequence that passes should start at or before 1 and end at or after 18932. I suspect nearly all of your sequences end at 18929, which of course, is before 18932 so those sequences are getting removed. I’d suggest using start=1, end=18929.

Also, it appears you’re using one of the longer regions of the 16S rRNA genes. I’d encourage you to check out this blogpost from a few years ago that is still painfully relevant…

Take care,
Pat

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.