Hi Pat,
I used 454 pyrosequencing for 16s rRNA analysis. I encountered that the “start” of some of my sequences lie much earlier than the expected majority “start” position, after sequence alignment and screening (artificial causes, I presume)
I was not able to remove these trouble sequences as the rest of the sequence lied on the correct position (start, end, minlength, optimized were not able to target them specifically).
I thought it would be handy to have a “hard” start command for screen.seqs, to restrict those sequences which start at the majority during screen.seqs, without having to create contigs reports file to use ostart/oendcommand
In the end, I used the align reports file to identify sequences outside the template start/stop position, and remove them by remove.seqs(accnos file), or is there any other ways others have thought of?
Here are the commands I used:
mothur > align.seqs(fasta=5172stage3.trim.unique.fasta, reference=silva.archaea.fasta)
mothur > summary.seqs()
Using 5172stage3.trim.unique.align as input file for the fasta parameter.
[WARNING]: This command can take a namefile and you did not provide one. The current namefile is 5172stage3.trim.names which seems to
match 5172stage3.trim.unique.align.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 0 0 0 0 1 1
2.5%-tile: 15667 16356 53 0 3 1900
25%-tile: 15667 23959 202 0 5 18999
Median: 15667 27651 325 0 5 37997
75%-tile: 15667 27651 325 0 5 56995
97.5%-tile: 15669 28467 329 0 5 74093
Maximum: 43116 43116 532 0 8 75992
Mean: 15928.5 25878.1 259.605 0 4.7108
of Seqs: 75992
Output File Names:
5172stage3.trim.unique.summary
mothur > screen.seqs(fasta=5172stage3.trim.unique.align., name=5172stage3.trim.names, group=5172stage3.groups, start=15667,
minlength=196)
mothur > summary.seqs(processors=2)
Using 5172stage3.trim.unique.align.good. as input file for the fasta parameter.
[WARNING]: This command can take a namefile and you did not provide one. The current namefile is 5172stage3.trim.good.names which seems
to match 5172stage3.trim.unique.align.good…
Start End NBases Ambigs Polymer NumSeqs
Minimum: 2035 11887 196 0 3 1
2.5%-tile: 15667 23447 206 0 4 1426
25%-tile: 15667 27639 322 0 5 14255
Median: 15667 27651 325 0 5 28509
75%-tile: 15667 27651 325 0 5 42763
97.5%-tile: 15667 27654 330 0 5 55592
Maximum: 15667 32473 532 0 8 57017
Mean: 15660.1 27217.1 308.852 0 4.98781
of Seqs: 57017
mothur > filter.seqs(fasta=current, vertical=T, trump=., processors=1)
Using 5172stage3.trim.unique.align.good. as input file for the fasta parameter.
Length of filtered alignment: 0
Number of columns removed: 50000
Length of the original alignment: 50000
Number of sequences used to construct filter: 57017
Best regards,
Yoga