Is there way in mothur to sub-sample using a upper instead of a lower limit? E.g. size=1500, would keep groups with sequence counts equal and below this size. I don’t wanna lose groups with fewer amount of sequences, so I would prefer reducing the size of the groups having more than this. There is no way of retrieving the eliminated samples from the sub.sample cmd?
No, we don’t since we think this is a really bad idea.
Yeah, obviously it’s not an optimal solution… but let’s say I’ve gotten a fairly skewed sequence distribution over samples and plates. If the expected depth is e.g. 2500 but many samples exceed this by the double or even more, but there is also a lot of samples having around 1000-1500 seqs. Wouldn’t it be almost equally bad using samples exceeding the expected depth of 2500, due to e.g. samples not amplifying well and other samples on the same plate thus ‘stealing’ more sequences? In this case, wouldn’t less be more?
The best option is to rarefy everything to 1000 sequences otherwise you’re imputing information that doesn’t exist.
Ok thanks! Could you point me to the cmd to use? Lot’s of cmds seem to have a rarefy flag but Im not sure that’s the thing I want…
rarefaction.single will work for alpha diversity. For beta-diversity you would use dist.shared(calc=…, subsample=1000, iters=???). For things like metastats you would use sub.sample.