Aligning producing sequences with 0 bases. Why

Laura_Kelly · September 19, 2016, 4:30pm

Hi All

Some help with the following would be really appreciated as I don’t want to waste time progressing only to find that there’s something seriously amiss with my data. Below is the short batch script I’ve used on a 454 data set beginning with an sff file provided directly by the sequence provider. I’ve run it both without flipping in the trim.seqs stage (first half below, in bold), but as I was not happy with the result I’ve tried renaming the original sff file and rerunning using the flip=T parameter. The results of the latter (as shown below) are nonsense. I’ve merely pasted the summary of the last two operations for each run.

Firstly, I thought the Flip=T parameter tries both flipping and not flipping to see which is best for each sequence? If so I would expect the outputs to be similar, unless of course not flipping doesn’t work at all. In fact the results below show relatively reasonable results from the unflipped sequences following alignment and the unique.seqs command. The exception being that while most sequences start at 1046, the max is 43116, and ends at the same.

There’s obviously something amiss here, having sequences of 0 bases doesn’t makes sense given the previous steps, but for the life of me I don’t know how this has happened. I’ve tried each step manually and repeated again as the batch below, with the same result. Luckily, running via a supercomputing network, I’ve not lost as much time as I otherwise could have.

All input gratefully received please. Thanks.

BATCH SCRIPT USED
sffinfo(sff=Kelly3832B.sff, flow=T)
trim.flows(flow=Kelly3832B.flow, oligos=Irwelloligos.txt, pdiffs=2, bdiffs=1, minflows=320, processors=12)
summary.seqs()
shhh.flows(file=Kelly3832B.flow.files)
trim.seqs(fasta=Kelly3832B.shhh.fasta, name=Kelly3832B.shhh.names, oligos=Irwelloligos.txt, maxhomop=8, pdiffs=2, bdiffs=1, minlength=200)
summary.seqs()
unique.seqs()
summary.seqs()
align.seqs(reference=silva.bacteria.fasta)
summary.seqs()
[b]Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 200 200 0 2 1
2.5%-tile: 1 210 210 0 3 2292
25%-tile: 1 258 258 0 4 22912
Median: 1 266 266 0 5 45824
75%-tile: 1 273 273 0 5 68736
97.5%-tile:1 285 285 0 6 89356
Maximum: 1 303 303 0 8 91647
Mean: 1 262.182262.182 0 4.51803

of Seqs: 91647

Output File Names: Kelly3832B.shhh.trim.summary

Accnos report generated during alignment 3kb

Summary post align.seqs
Start End NBases Ambigs Polymer NumSeqs
Minimum: 0 0 0 0 1 1
2.5%-tile: 1044 5278 206 0 3 1532
25%-tile: 1044 5703 256 0 4 15317
Median: 1044 6098 265 0 5 30634
75%-tile: 1044 6216 272 0 5 45951
97.5%-tile:1046 6394 285 0 6 59736
Maximum: 43116 43116 303 0 8 61267 [THIS RESULT DOES NOT MAKE SENSE] :o[/b]
[b]Mean: 1132.99 6005.18 259.747 0 4.54969

of Seqs: 61267

Output File Names: Kelly3832B.shhh.trim.unique.summary[/b]

BATCH SCRIPT USED WHEN FLIP USED - RESULTS MAKE EVEN LESS SENSE THAN WHEN I DON'T FLIP sffinfo(sff=Irwell.sff, flow=T) trim.flows(flow=Irwell.flow, oligos=Irwelloligos.txt, pdiffs=2, bdiffs=1, minflows=320, processors=12) summary.seqs() shhh.flows(file=Irwell.flow.files) trim.seqs(fasta=Irwell.shhh.fasta, name=Irwell.shhh.names, oligos=Irwelloligos.txt, maxhomop=8, pdiffs=2, bdiffs=1, minlength=200, flip=T) summary.seqs() unique.seqs() summary.seqs() align.seqs(reference=silva.bacteria.fasta) summary.seqs()

Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 200 200 0 2 1
2.5%-tile: 1 210 210 0 3 2292
25%-tile: 1 258 258 0 4 22914
Median: 1 266 266 0 5 45827
75%-tile: 1 273 273 0 5 68740
97.5%-tile:1 285 285 0 6 89362
Maximum: 1 303 303 0 8 91653
Mean: 1 262.182 262.182 0 4.51801

of Seqs: 91653

Output File Names: Irwell.shhh.trim.summary

Accnos report generated during alignment almost 900kb!

Start End NBases Ambigs Polymer NumSeqs
Minimum: 0 0 0 0 1 1
2.5%-tile: 1044 1046 1 0 1 1532
25%-tile: 1044 1065 3 0 2 15320
Median: 1044 1106 7 0 2 30640
75%-tile: 43103 43116 14 0 2 45960
97.5%-tile:43116 43116 23 0 3 59748
Maximum: 43116 43116 79 0 7 61279
Mean: 18686.2 18711.7 10.1003 0 1.92348

of Seqs: 61279

Output File Names: Irwell.shhh.trim.unique.summary

pschloss · September 26, 2016, 12:54pm

Hi Laura,

In align.seqs you should try using flip=T, not in trim.seqs. Because some of your sequences don’t align, they come back as having a length of zero to either the start or the end of hte reference alignment. For what you have, I’d suggest running screen.seqs with start=1044, end=5278. Also, in trim.flows, I’d set maxflows to the same value as minflows so that you get good denoising.

Pat

Laura_Kelly · October 6, 2016, 4:01pm

Hi Pat

Thanks for your suggestion. I’m afraid the flip=T in align also resulted in the same thing, however with your (and Sarahs suggestion, by email), I’ve managed to progress and get to the OTU stage, though have come a cropper with the make.shared command now, and confused over the groups file to be use. I’ve posted a separate question for this.
Thanks again to you and Sarah.

Topic		Replies	Views
Warning Some of your sequences generated alignments that eliminated too many bases Commands in mothur	5	1637	December 1, 2022
Flip in trim.seqs Commands in mothur	8	5974	April 17, 2014
Help with align.seqs Commands in mothur	9	11167	September 26, 2014
Align.seqs Commands in mothur	3	4136	September 7, 2010
align.seqs deletes almost all bases Commands in mothur	4	1849	June 16, 2015

Aligning producing sequences with 0 bases. Why

of Seqs: 91647

of Seqs: 61267

of Seqs: 91653

of Seqs: 61279

Related Topics