I’m a Mothur beginer and I struggle to determine what would be the more appropriate min and max flows values (trim.flows command) in my case: Titanium data obtained with primers 343f/806r (theoritical amplicon lenght=463bp), the majority of my sequences being btw 400 and 450 bp long.
According to what it is said on the SOP, “triming each flowgram to 450 flows does the best job of reducing the sequencing noise. However, this assumes that your amplicon is longer than 450 flows (i.e. Bacterial primers 27F/519R which amplify V1-3 only requires 350-400 flows to sequence depending on the exact sequence). Chris Quince suggests a minimum number of flows of 360 and a maximum of 720”
Using the default values (450 flows) half of my sequences end up in the scrap file; when using minflows=360, maxflows=720, 35% of the sequence are removed.
Using 360/720 makes more sense for my data than using 450 but I’m wondering if it would be more appropriate to use a lower value for minflows…
Many thanks for your help
The length of your fragment doesn’t matter for this. If you have Titanium data, just stick with the default minflows/maxflows=450. This threshold is set because the sequence quality of the platform/chemistry drops off with length. And it was derived using A LOT of Titanium data on fragments similar in length to yours. So please don’t try to reinvent the wheel, you will certainly have computational problems further down the pipeline and you’ll be back here asking why command X doesn’t work :). Also, an important thing that you and others are confusing is that the number of flows does not equal the number of bases. Using 450 flows will get you about 270 bp. If you use 360/720 then you will have very mixed length fragments between ~200 and 360 bp. If you look at our recent papers you’ll see that having fragments of varying lengths is a bad idea because the 16S rRNA gene does not evolve uniformly along its length. Also, this compromises the efficiency of PyroNoise because it will think all those varying length sequences are different, when many of them are really the same. If you look at our 2012 PLoS ONE paper you’ll see that using 360/720 with pyronoise/shhh.flows is really a waste of time because there’s so little denoising. Just stick with 450.
Hope this helps,
Thanks a lot for your quick reply - much appreciated.
I guess you mean your 2011 Plos paper? (“reducing the effects of PCR amplification”), I did read it carefully, and understood the interest of sticking with 450 flows, but then I also red other stuff implying the number of flows could be adjusted depending on amplicon lenght… anyway I will stick with 450 - thanks for the clarification!
However, what do you think about the fact that when runnning trim.flows with the default values (450 flows) half of my sequences end up in the scrap file (mydata.trim.flow contains 62478 seq and mydata.scrap.flow contains 65895 seq)?
Based on your paper, the % of removed sequence is much lower!
Right - that’s the one.
If you’re losing that many sequences, I suggest talking with your sequence provider to see what’s going on because it indicates that the run was probably bad.
I also sequence v1-3 and this is the one place where I don’t follow the SOP-I had to reduce my flows to 300 because I was loosing too many sequences because they were completely sequenced before the 450 recommended.
(I chose 300 after counting the number of flows to expect in a complete v1-3 sequence in a few phyla-homopolymers=1 flow)
I guess all I can say is that we included V13 data in the 2011 paper and didn’t see this.