I’m hoping to get an opinion on subsampling.
I have three sources of digesta that I’ve run separately. They were all subsampled to 11000 like in this example: sub.sample(shared=colon.an.shared, size=11000).
They could have been subsampled like this, based on the lowest reasonable number of sequences I have to deal with:
My assumption was to subsample to one level (11000) even though I’m not currently comparing between digesta samples (mostly because my runs get killed on my HPCC when I try to run all samples instead of three groups).
I do lose some mice from the analysis when I subsample to 11000.
Should I subsample to 11000, 9898, and 8751 and not lose mice from the analysis? Or should I subsample all to 11000 to keep consistency?