Hi All,
So I have a question about normalizing my data. I see from the Schloss protocol there is a command that can generate a random subsample (using the sub.sample command). Unfortunately, most of my projects have one or two groups (conditions) that have significantly less samples than the others. If I used the sub.sample command to only keep a number of samples equal to the smallest group, I’d end up loosing most of my data. I have several groups that would go from 3000-8000 sequences for the majority of the conditions, down to 1000 since that’s the size of the smallest group. In other cases, I would go from 3000-15000 down to less than 1000, which isn’t realistic.
Does anyone have suggestions as to how I can handle this situation? Can I use a larger sample size for the sub.sample command and maybe just add in the smaller groups afterward? Is there a way to use the sub.sample command and have it keep all the samples from groups that fail to meet the minimum sample size? Any other suggestions?
My other question (posed by my supervisor) is whether there is a way in Mothur to check to see whether you need to normalize your data, or do we just assume we need to normalize? Any advice would be appreciated. Thanks!
-Damon