I’ve run the SOP on several titanium datasets and have gotten substantially reduced #s of reads (i.e. one set starts with 120k reads, results in 28k). So I’ve backed up through the SOP to determine where I’m loosing the reads. Searching the uncorrected fasta, I find 106k exact matches to the primer and a sum of 103k exact matches to the barcodes. Increasing my pdiffs allowed didn’t change the resulting #s of sequences much. Finally, I tried running trim.flows without the oligos file-using only optional arguments minflows=450(360) and maxflows=450(720). Even that very minor quality trimming, resulted in dramatic reduction of #s 45k for 450flows and 53k for the Quince parameters. This seems really high to me. Is it? or is this the cost of high throughput sequencing-half fails the most basic quality control?
eta:I just check with a combination of gawk and grep 111k sequences in that flow file are greater than 450 flows, so why did they get thrown out by trim.flows??