hard start for screen.seqs

Hi Pat,

I used 454 pyrosequencing for 16s rRNA analysis. I encountered that the “start” of some of my sequences lie much earlier than the expected majority “start” position, after sequence alignment and screening (artificial causes, I presume)
I was not able to remove these trouble sequences as the rest of the sequence lied on the correct position (start, end, minlength, optimized were not able to target them specifically).

I thought it would be handy to have a “hard” start command for screen.seqs, to restrict those sequences which start at the majority during screen.seqs, without having to create contigs reports file to use ostart/oendcommand :slight_smile:

In the end, I used the align reports file to identify sequences outside the template start/stop position, and remove them by remove.seqs(accnos file), or is there any other ways others have thought of?

Here are the commands I used:

mothur > align.seqs(fasta=5172stage3.trim.unique.fasta, reference=silva.archaea.fasta)

mothur > summary.seqs()
Using 5172stage3.trim.unique.align as input file for the fasta parameter.
[WARNING]: This command can take a namefile and you did not provide one. The current namefile is 5172stage3.trim.names which seems to
match 5172stage3.trim.unique.align.

Start End NBases Ambigs Polymer NumSeqs
Minimum: 0 0 0 0 1 1
2.5%-tile: 15667 16356 53 0 3 1900
25%-tile: 15667 23959 202 0 5 18999
Median: 15667 27651 325 0 5 37997
75%-tile: 15667 27651 325 0 5 56995
97.5%-tile: 15669 28467 329 0 5 74093
Maximum: 43116 43116 532 0 8 75992
Mean: 15928.5 25878.1 259.605 0 4.7108

of Seqs: 75992

Output File Names:
5172stage3.trim.unique.summary

mothur > screen.seqs(fasta=5172stage3.trim.unique.align., name=5172stage3.trim.names, group=5172stage3.groups, start=15667,
minlength=196)

mothur > summary.seqs(processors=2)
Using 5172stage3.trim.unique.align.good. as input file for the fasta parameter.
[WARNING]: This command can take a namefile and you did not provide one. The current namefile is 5172stage3.trim.good.names which seems
to match 5172stage3.trim.unique.align.good…

Start End NBases Ambigs Polymer NumSeqs
Minimum: 2035 11887 196 0 3 1
2.5%-tile: 15667 23447 206 0 4 1426
25%-tile: 15667 27639 322 0 5 14255
Median: 15667 27651 325 0 5 28509
75%-tile: 15667 27651 325 0 5 42763
97.5%-tile: 15667 27654 330 0 5 55592
Maximum: 15667 32473 532 0 8 57017
Mean: 15660.1 27217.1 308.852 0 4.98781

of Seqs: 57017

mothur > filter.seqs(fasta=current, vertical=T, trump=., processors=1)
Using 5172stage3.trim.unique.align.good. as input file for the fasta parameter.

Length of filtered alignment: 0
Number of columns removed: 50000
Length of the original alignment: 50000
Number of sequences used to construct filter: 57017

Best regards,
Yoga

Try adding end=23447, that will remove sequences that end before 23447.