subsample reads before aligning/chimera checking

Kendra · June 7, 2016, 7:23pm

This past weekend I was analyzing samples (190 samples, 2x250 miseq data, 14.5M reads all together) pretty much real-time for a bioblitz (whoohoo so much fun). Some of the samples took forever to go through chimera checking because even after pre-clustering at 2 diffs some samples had >30k seqs. Since I was going to subsample the final data to 10k reads per sample, what do you think about subsampling all samples down to some number of reads (like 25 or 30k) at the beginning of the processing to reduce the computational weight of those samples that just happen to come out of the sequencer with 100k reads?

pschloss · June 9, 2016, 4:54pm

That’s probably a reasonable thing to do. Weren’t you telling me that you had a great normalization method? :lol:

Kendra · June 10, 2016, 3:39pm

hey now!!

Maybe I should have said, pretty-decent-and-really-cheap-since-the-instrument-is-already-bought normalization method

Topic		Replies	Views
tips on subsampling, feature request? Theory behind mothur	5	5286	February 4, 2014
Question regarding subsampling Theory behind mothur	9	9256	March 4, 2013
Lots of unique seqs and too few chimeras? Theory behind mothur	2	945	April 9, 2018
Chimera check 250 bp sequences? Theory behind mothur	3	3409	January 14, 2015
too many chimeras? Commands in mothur	4	968	February 1, 2018

subsample reads before aligning/chimera checking

Related topics