Acyclic nucleotide flows on 454

Hi,

I have recently received a new set of 454 data where instead of repeatable nucleotide flows, the flows are randomized. This seems to have improved overall data output (more reads and tighter distribution of read lengths). However, the data is no longer compatible with current implementation of PyroNoise in Mothur. I don’t know if this acyclic nucleotide flow on 454 is going to become common or not, but just wanted to check if there were plans to address this in Mothur.

Thanks very much,
Ameet

We’re on it and hope to have a modified trim.flows/shhh.flows/Lookup file for the next release.

Pat

Same here, the sequencing center said:“This data was generated using the very latest version of Roche software, v2.8 Flow B, which uses an acyclic flow pattern during sequencing, increasing overall run quality. It will also not work with Mothur sffinfo. Instead of using these scripts, most customers are opting to remove single/low abundance OTUs in a later step.”

The problem isn’t sffinfo - it’s trim.flows. This will be fixed in the next release.

most customers are opting to remove single/low abundance OTUs in a later step

And those people are foolish.

Thanks Pat. Looking forward to the update.

Re: removal of single/low abundance OTUs. Agreed, blanket removal of low abundance OTUs is a bad idea. End up throwing out the good OTUs with the bad. Best to do a thorough quality control upstream.

In fact several medium abundance (and possible high abundance OTUs) can turn also turn out to be spurious. Some discussion of this in our recent paper. http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0043093

Best,
Ameet

Just to be perfectly clear, with release 2.8 Flow Pattern B is only designed for genomic data; Flow Pattern A is still appropriate for amplicons. Hopefully things will be fixed for amplicons with release 2.9.

I am a little confused. I was under the impression that the flow pattern A and B to be introduced in 1.30 would be as follows. Order = A, would rely on repeating nucleotide flow pattern (cyclic) and Order=B, would be able to process files with irregular nucleotide flow patterns (acyclic). Am I mistaken?
If only flow pattern A is good for amplicon sequencing, should I extract the flow pattern from the sff file in provide this in the command? Or is this option not yet going to be available and I should use q=35, qwindowsize=50?
I tried this and it seems like the acyclic flow pattern does not really provide any more good quality reads than the cyclic flow pattern for my dataset. Acyclic flow does provide more raw reads (twice as many), but q=35, qwindowsize-50 ends up tossing 50% of the reads - back to number of sequences equivalent to cyclic flow after quality filtering.

Thanks very much for your help.

I am a little confused. I was under the impression that the flow pattern A and B to be introduced in 1.30 would be as follows. Order = A, would rely on repeating nucleotide flow pattern (cyclic) and Order=B, would be able to process files with irregular nucleotide flow patterns (acyclic). Am I mistaken?

This is what happens when I try to be clear so as to not offend people. The next version of mothur (Monday-ish) will do exactly as you say.

If only flow pattern A is good for amplicon sequencing, should I extract the flow pattern from the sff file in provide this in the command? Or is this option not yet going to be available and I should use q=35, qwindowsize=50?

If you have flow pattern B data, that’s what you’ve got. You would have to re-do the run with pattern A to get pattern A data. The problem is that they’re still working out the best image analysis protocol for amplicon data when using the B pattern. At the moment, I might suggest sticking with q35w50 since the shhh.flows-based approach doesn’t appear to be any better (perhaps worse?) than just using q35w50. One benefit might be that with the shhh.flows you get longer reads, but slightly higher error rates.

I tried this and it seems like the acyclic flow pattern does not really provide any more good quality reads than the cyclic flow pattern for my dataset. Acyclic flow does provide more raw reads (twice as many), but q=35, qwindowsize-50 ends up tossing 50% of the reads - back to number of sequences equivalent to cyclic flow after quality filtering.

But perhaps the reads are longer with the FLX+/Flow B?

This is a moving target and we’ll do our best to let everyone know what’s best when we get there. For now, it seems best to stick with pattern A when sequencing amplicons.

Thanks Pat and sorry about the confusion.

Slightly longer reads for pattern B - median length probably increased by 50 bp in my case across. The downside for my particular dataset is that the first plate is pattern A and second plate is pattern B. All data to be analyzed together. So, once i get to filter.seqs with trump=., the benefit of longer reads for pattern B, goes away. Live and learn.

Thanks again for your help.

Hi, my apologies if my question is not exactly fitting this entry but I couldn´t find any better.

I´m not experienced and it´s been more than a year since I last used Mothur. I have to deal now with old 454 data. At the time of obtaining the sequences, we had to repeat some samples and as we did not have money to go to the same company, we obtained the PCR products with the barcodes as the first time (all cases fragments of about 700 bp) and send them to be sequenced as a cooperation with a university (they told us they had leftovers of some kits and that they´ll use flx+). Months later we got short sequences, which we assumed were due to old kits, etc.

Now, running sffinfo, we realize that there is no manifest file and the flows are no more than 800 so we think they used an old titanium machine.
Therefore, I cannot denoise and treat all the samples from the project together.

Is there any chance to start denoising in two separate batch of samples, using the corresponding lookup files and number of flows for trimming and then mix both output files to continue with the processing of the improved sequences all together? of course at the point of trimming the alignment we´ll have to go for the shortest fragments and loose the information from the longer flx+ sequences :frowning:

I know it´s not the best situation, but truly I need to use these data and work with all the samples together to compare between them and with the metadata we have. I try to find the best way as I cannot sequence again.

Thanks!!!

You could set all the minflows/maxflows values to 450. Alternatively, you could probably resequence the samples on a MiSeq for much less than the first go around.

Pat