Sub-sampling

jgilbreath · November 13, 2011, 11:54pm

Hello everyone,

I am currently working with an V6 data set that was sequenced via illumina Hi-seq; due to the large number of sequences we have for each sample, I am having difficulty analyzing the full data set in mothur. A colleague of mine suggested using a sub-sampling strategy in order to decrease the number of sequences for each sample. This approach seems to work well, and has eliminated the problems I was having in mothur. Of course, one major question I have about using this strategy is at what depth will sub-sampling accurately represent the full data set (or at least come reasonably close). Does anyone have any suggestions as to which measurements go by in order to compare the sub-sampled data to the full data set??
The Good’s coverage calculator seems like an obvious place to start, but I’m not sure what else to compare.

Thanks

Jeremy

gwidmer · December 2, 2011, 7:05pm

Jeremy, in case you are still searching for an answer, this could help:

Aguirre de CÃ¡rcer D, Denman SE, McSweeney C, Morrison M. Evaluation of
subsampling-based normalization strategies for tagged high-throughput sequencing
datasets from gut microbiomes. Appl Environ Microbiol. 2011 Oct 7.

Giovanni

jgilbreath · December 2, 2011, 11:49pm

Thanks!

Topic		Replies	Views
Question regarding subsampling Theory behind mothur	9	9313	March 4, 2013
tips on subsampling, feature request? Theory behind mothur	5	5300	February 4, 2014
Normalization Commands in mothur	1	4354	May 8, 2012
Normalization Commands in mothur	4	3462	May 8, 2013
sub.sample Commands in mothur	8	12831	April 12, 2012

Sub-sampling

Related topics