Chosing a representative sample

Another one in my series of questions on ‘how could I get this done’:

I have 16S data from 6 samples. These samples are water samples from different layers within a geological formation. I looked at both abundant and shared OTUs of these 6 samples.

Now, I would like to know which of these samples is most representative for the geological formation. It would be great if I could ‘choose’ one layer to sample water from, to use in future lab (batch) tests and call the community of this sample the most ‘representative’ for the formation. This sample would have the overall highest number of species within most of the shared OTUs and/or hold most of the abundant OTUs. But I have absolutely no idea on how to find this one with Mothur.

It could also turn out that none of them is really suitable and that I should always use a mixture of samples. I know this would be the safest anyway. But as the sampling at 200m depth is quite tedious, it would be so much better if I could chose only one. But which?

I will be thinking on it myself, of course, but if anyone would have a killer idea, I would appreciate it…


How about calculating a beta-diversity metric between all of the samples and see which sample is the shortest distance to all of the other samples? This is what we do with get.oturep and this strategy should work for your situation as well.