subsample reads before aligning/chimera checking

This past weekend I was analyzing samples (190 samples, 2x250 miseq data, 14.5M reads all together) pretty much real-time for a bioblitz (whoohoo so much fun). Some of the samples took forever to go through chimera checking because even after pre-clustering at 2 diffs some samples had >30k seqs. Since I was going to subsample the final data to 10k reads per sample, what do you think about subsampling all samples down to some number of reads (like 25 or 30k) at the beginning of the processing to reduce the computational weight of those samples that just happen to come out of the sequencer with 100k reads?

That’s probably a reasonable thing to do. Weren’t you telling me that you had a great normalization method? :lol:

hey now!!

Maybe I should have said, pretty-decent-and-really-cheap-since-the-instrument-is-already-bought normalization method :wink: