Test differential representation results in an independent dataset?

Dear all

First of all apologies for spamming the forum but this is the only place that has the right people that can answer my question.

We recently got a review from a paper we submitted and among other things we did a differential representation analysis using the indicspecies package.

Just a bit of background: the study examines two groups of samples coming from cows called shedders (or O157 group) and non-shedders (non-O157 group) (if the O157 pathogen is detected in the cow feces, then the animals are called shedders, if not they are called non-shedders)…

And here is the reviewer’s comment:
« The microbiome study is interesting, and again several cited studies support the idea that microbiota composition may play a role in persistence. An indicator species analysis was successful in identifying OTUs that was found in non-O157 shedding microbiomes and other found in O157 shedding genomes. Unfortunately, the suggested indicator variants were not tested on an independent non/shedding dataset, so there was no validation of the significance of the identified variants.”

Is that person -in all seriousness- asking us to go and get a different set of samples from non-O157 animals and check if the non-O157 OTUs (which we found to be representative of the non-O157 group) are still there?

Thank you all for your responses in advances, just wanna make sure I got his/her point!

It sounds like they are asking for that. I’m with you in thinking that this is a big ask at this point in the field’s development. I think you have a couple of options…

  1. Politely, tell them to go fly a kite… “Future investigations will be needed to validate these biomarkers and to determine whether they have a mechanistic role in O157 shedding”. I’d especially go this route for non-Nature/Science manuscripts

  2. Point out that the taxa you found were important have been seen in similar roles in other studies - i.e., the results make sense.

  3. Could you artificially create a held out dataset? You could take 20% of your samples hold them out, run the test on the other 80% and then see if the results on the 80% hold up for the 20%? You could then repeat that a bunch of times. This is called cross-validation and is often used in situations like this one.

Hope this helps a bit!


Yeap, nr 1 and nr 2 is what I thought as well. And yes it helps a lot! Thanks again, P.

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.