trim.flows removes 50% of the seqs with only length option

I’ve run the SOP on several titanium datasets and have gotten substantially reduced #s of reads (i.e. one set starts with 120k reads, results in 28k). So I’ve backed up through the SOP to determine where I’m loosing the reads. Searching the uncorrected fasta, I find 106k exact matches to the primer and a sum of 103k exact matches to the barcodes. Increasing my pdiffs allowed didn’t change the resulting #s of sequences much. Finally, I tried running trim.flows without the oligos file-using only optional arguments minflows=450(360) and maxflows=450(720). Even that very minor quality trimming, resulted in dramatic reduction of #s 45k for 450flows and 53k for the Quince parameters. This seems really high to me. Is it? or is this the cost of high throughput sequencing-half fails the most basic quality control?

eta:I just check with a combination of gawk and grep 111k sequences in that flow file are greater than 450 flows, so why did they get thrown out by trim.flows??

Well, 454 is notorious for selling crappy reagents that result in bad sequence reads. I’d go back to your sequence provider.

well damn. These are legacy sequences from a former postdoc so probably won’t be doing any more sequencing.

Some of the sequences may just be bad, but I also realized that by amplifying V1-3 (27f/519r) my minimum flows needs to be lower.

Hi there,

I’m having a similar problem with a large number of sequences getting thrown out using the trim.flows command (largely due to read length). I find the more I lower the minflows parameter, less reads are scrapped… Is there a minflows value that would be considered just far too low to function as quality control?