Aligning producing sequences with 0 bases. Why

Hi All

Some help with the following would be really appreciated as I don’t want to waste time progressing only to find that there’s something seriously amiss with my data. Below is the short batch script I’ve used on a 454 data set beginning with an sff file provided directly by the sequence provider. I’ve run it both without flipping in the trim.seqs stage (first half below, in bold), but as I was not happy with the result I’ve tried renaming the original sff file and rerunning using the flip=T parameter. The results of the latter (as shown below) are nonsense. I’ve merely pasted the summary of the last two operations for each run.

Firstly, I thought the Flip=T parameter tries both flipping and not flipping to see which is best for each sequence? If so I would expect the outputs to be similar, unless of course not flipping doesn’t work at all. In fact the results below show relatively reasonable results from the unflipped sequences following alignment and the unique.seqs command. The exception being that while most sequences start at 1046, the max is 43116, and ends at the same.

There’s obviously something amiss here, having sequences of 0 bases doesn’t makes sense given the previous steps, but for the life of me I don’t know how this has happened. I’ve tried each step manually and repeated again as the batch below, with the same result. Luckily, running via a supercomputing network, I’ve not lost as much time as I otherwise could have.

All input gratefully received please. Thanks.

BATCH SCRIPT USED
sffinfo(sff=Kelly3832B.sff, flow=T)
trim.flows(flow=Kelly3832B.flow, oligos=Irwelloligos.txt, pdiffs=2, bdiffs=1, minflows=320, processors=12)
summary.seqs()
shhh.flows(file=Kelly3832B.flow.files)
trim.seqs(fasta=Kelly3832B.shhh.fasta, name=Kelly3832B.shhh.names, oligos=Irwelloligos.txt, maxhomop=8, pdiffs=2, bdiffs=1, minlength=200)
summary.seqs()
unique.seqs()
summary.seqs()
align.seqs(reference=silva.bacteria.fasta)
summary.seqs()

[b]Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 200 200 0 2 1
2.5%-tile: 1 210 210 0 3 2292
25%-tile: 1 258 258 0 4 22912
Median: 1 266 266 0 5 45824
75%-tile: 1 273 273 0 5 68736
97.5%-tile:1 285 285 0 6 89356
Maximum: 1 303 303 0 8 91647
Mean: 1 262.182262.182 0 4.51803

of Seqs: 91647

Output File Names: Kelly3832B.shhh.trim.summary

Accnos report generated during alignment 3kb

Summary post align.seqs
Start End NBases Ambigs Polymer NumSeqs
Minimum: 0 0 0 0 1 1
2.5%-tile: 1044 5278 206 0 3 1532
25%-tile: 1044 5703 256 0 4 15317
Median: 1044 6098 265 0 5 30634
75%-tile: 1044 6216 272 0 5 45951
97.5%-tile:1046 6394 285 0 6 59736
Maximum: 43116 43116 303 0 8 61267 [THIS RESULT DOES NOT MAKE SENSE] :o[/b]
[b]Mean: 1132.99 6005.18 259.747 0 4.54969

of Seqs: 61267

Output File Names: Kelly3832B.shhh.trim.unique.summary[/b]





BATCH SCRIPT USED WHEN FLIP USED - RESULTS MAKE EVEN LESS SENSE THAN WHEN I DON'T FLIP sffinfo(sff=Irwell.sff, flow=T) trim.flows(flow=Irwell.flow, oligos=Irwelloligos.txt, pdiffs=2, bdiffs=1, minflows=320, processors=12) summary.seqs() shhh.flows(file=Irwell.flow.files) trim.seqs(fasta=Irwell.shhh.fasta, name=Irwell.shhh.names, oligos=Irwelloligos.txt, maxhomop=8, pdiffs=2, bdiffs=1, minlength=200, flip=T) summary.seqs() unique.seqs() summary.seqs() align.seqs(reference=silva.bacteria.fasta) summary.seqs()

Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 200 200 0 2 1
2.5%-tile: 1 210 210 0 3 2292
25%-tile: 1 258 258 0 4 22914
Median: 1 266 266 0 5 45827
75%-tile: 1 273 273 0 5 68740
97.5%-tile:1 285 285 0 6 89362
Maximum: 1 303 303 0 8 91653
Mean: 1 262.182 262.182 0 4.51801

of Seqs: 91653

Output File Names: Irwell.shhh.trim.summary

Accnos report generated during alignment almost 900kb!

Start End NBases Ambigs Polymer NumSeqs
Minimum: 0 0 0 0 1 1
2.5%-tile: 1044 1046 1 0 1 1532
25%-tile: 1044 1065 3 0 2 15320
Median: 1044 1106 7 0 2 30640
75%-tile: 43103 43116 14 0 2 45960
97.5%-tile:43116 43116 23 0 3 59748
Maximum: 43116 43116 79 0 7 61279
Mean: 18686.2 18711.7 10.1003 0 1.92348

of Seqs: 61279

Output File Names: Irwell.shhh.trim.unique.summary

Hi Laura,

In align.seqs you should try using flip=T, not in trim.seqs. Because some of your sequences don’t align, they come back as having a length of zero to either the start or the end of hte reference alignment. For what you have, I’d suggest running screen.seqs with start=1044, end=5278. Also, in trim.flows, I’d set maxflows to the same value as minflows so that you get good denoising.

Pat

Hi Pat

Thanks for your suggestion. I’m afraid the flip=T in align also resulted in the same thing, however with your (and Sarahs suggestion, by email), I’ve managed to progress and get to the OTU stage, though have come a cropper with the make.shared command now, and confused over the groups file to be use. I’ve posted a separate question for this.
Thanks again to you and Sarah.