Hi All
Some help with the following would be really appreciated as I don’t want to waste time progressing only to find that there’s something seriously amiss with my data. Below is the short batch script I’ve used on a 454 data set beginning with an sff file provided directly by the sequence provider. I’ve run it both without flipping in the trim.seqs stage (first half below, in bold), but as I was not happy with the result I’ve tried renaming the original sff file and rerunning using the flip=T parameter. The results of the latter (as shown below) are nonsense. I’ve merely pasted the summary of the last two operations for each run.
Firstly, I thought the Flip=T parameter tries both flipping and not flipping to see which is best for each sequence? If so I would expect the outputs to be similar, unless of course not flipping doesn’t work at all. In fact the results below show relatively reasonable results from the unflipped sequences following alignment and the unique.seqs command. The exception being that while most sequences start at 1046, the max is 43116, and ends at the same.
There’s obviously something amiss here, having sequences of 0 bases doesn’t makes sense given the previous steps, but for the life of me I don’t know how this has happened. I’ve tried each step manually and repeated again as the batch below, with the same result. Luckily, running via a supercomputing network, I’ve not lost as much time as I otherwise could have.
All input gratefully received please. Thanks.
BATCH SCRIPT USED
sffinfo(sff=Kelly3832B.sff, flow=T)
trim.flows(flow=Kelly3832B.flow, oligos=Irwelloligos.txt, pdiffs=2, bdiffs=1, minflows=320, processors=12)
summary.seqs()
shhh.flows(file=Kelly3832B.flow.files)
trim.seqs(fasta=Kelly3832B.shhh.fasta, name=Kelly3832B.shhh.names, oligos=Irwelloligos.txt, maxhomop=8, pdiffs=2, bdiffs=1, minlength=200)
summary.seqs()
unique.seqs()
summary.seqs()
align.seqs(reference=silva.bacteria.fasta)
summary.seqs()
[b]Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 200 200 0 2 1
2.5%-tile: 1 210 210 0 3 2292
25%-tile: 1 258 258 0 4 22912
Median: 1 266 266 0 5 45824
75%-tile: 1 273 273 0 5 68736
97.5%-tile:1 285 285 0 6 89356
Maximum: 1 303 303 0 8 91647
Mean: 1 262.182262.182 0 4.51803
of Seqs: 91647
Output File Names: Kelly3832B.shhh.trim.summary
Accnos report generated during alignment 3kb
Summary post align.seqs
Start End NBases Ambigs Polymer NumSeqs
Minimum: 0 0 0 0 1 1
2.5%-tile: 1044 5278 206 0 3 1532
25%-tile: 1044 5703 256 0 4 15317
Median: 1044 6098 265 0 5 30634
75%-tile: 1044 6216 272 0 5 45951
97.5%-tile:1046 6394 285 0 6 59736
Maximum: 43116 43116 303 0 8 61267 [THIS RESULT DOES NOT MAKE SENSE] :o[/b]
[b]Mean: 1132.99 6005.18 259.747 0 4.54969
of Seqs: 61267
Output File Names: Kelly3832B.shhh.trim.unique.summary[/b]
BATCH SCRIPT USED WHEN FLIP USED - RESULTS MAKE EVEN LESS SENSE THAN WHEN I DON'T FLIP sffinfo(sff=Irwell.sff, flow=T) trim.flows(flow=Irwell.flow, oligos=Irwelloligos.txt, pdiffs=2, bdiffs=1, minflows=320, processors=12) summary.seqs() shhh.flows(file=Irwell.flow.files) trim.seqs(fasta=Irwell.shhh.fasta, name=Irwell.shhh.names, oligos=Irwelloligos.txt, maxhomop=8, pdiffs=2, bdiffs=1, minlength=200, flip=T) summary.seqs() unique.seqs() summary.seqs() align.seqs(reference=silva.bacteria.fasta) summary.seqs()
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 200 200 0 2 1
2.5%-tile: 1 210 210 0 3 2292
25%-tile: 1 258 258 0 4 22914
Median: 1 266 266 0 5 45827
75%-tile: 1 273 273 0 5 68740
97.5%-tile:1 285 285 0 6 89362
Maximum: 1 303 303 0 8 91653
Mean: 1 262.182 262.182 0 4.51801
of Seqs: 91653
Output File Names: Irwell.shhh.trim.summary
Accnos report generated during alignment almost 900kb!
Start End NBases Ambigs Polymer NumSeqs
Minimum: 0 0 0 0 1 1
2.5%-tile: 1044 1046 1 0 1 1532
25%-tile: 1044 1065 3 0 2 15320
Median: 1044 1106 7 0 2 30640
75%-tile: 43103 43116 14 0 2 45960
97.5%-tile:43116 43116 23 0 3 59748
Maximum: 43116 43116 79 0 7 61279
Mean: 18686.2 18711.7 10.1003 0 1.92348
of Seqs: 61279
Output File Names: Irwell.shhh.trim.unique.summary