other options (e.g. median and mean) for subsampling

ch3coch3 · June 19, 2017, 10:14pm

Hi,

I read a paper evaluating methods of sub-sampling, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3233110/. The authors suggested that sub-sampling to median reads number was more accurate than that to the minimum number.

I thought I would initiate a friendly discussion here regarding to rationals and reasoning for sub-sampling methods.

I suppose that I could use python to “recode” as mentioned in the paper for median sub-sampling. Mothur would do sub-sample to the minimum size. I’m more interested in the rationals.

I would appreciate any opinion.

Thanks

pschloss · June 26, 2017, 1:18pm

Subsampling to the median makes zero sense to me. This would mean upsampling samples that have less than the median number of sequences and effectively making up data. People should pick an acceptable threshold and rarefy to that number of sequences.

Pat

Topic		Replies	Views
tips on subsampling, feature request? Theory behind mothur	5	5287	February 4, 2014
Question regarding subsampling Theory behind mothur	9	9260	March 4, 2013
Normalization Commands in mothur	1	4332	May 8, 2012
sub.sample - upper limit Commands in mothur	6	4063	June 6, 2013
How to determine size for sub.sample Commands in mothur	1	1581	March 30, 2015

other options (e.g. median and mean) for subsampling

Related topics