Test differential representation results in an independent dataset?

sapou · November 30, 2021, 3:18pm

Dear all

First of all apologies for spamming the forum but this is the only place that has the right people that can answer my question.

We recently got a review from a paper we submitted and among other things we did a differential representation analysis using the indicspecies package.

Just a bit of background: the study examines two groups of samples coming from cows called shedders (or O157 group) and non-shedders (non-O157 group) (if the O157 pathogen is detected in the cow feces, then the animals are called shedders, if not they are called non-shedders)…

And here is the reviewer’s comment:
« The microbiome study is interesting, and again several cited studies support the idea that microbiota composition may play a role in persistence. An indicator species analysis was successful in identifying OTUs that was found in non-O157 shedding microbiomes and other found in O157 shedding genomes. Unfortunately, the suggested indicator variants were not tested on an independent non/shedding dataset, so there was no validation of the significance of the identified variants.”

Is that person -in all seriousness- asking us to go and get a different set of samples from non-O157 animals and check if the non-O157 OTUs (which we found to be representative of the non-O157 group) are still there?

Thank you all for your responses in advances, just wanna make sure I got his/her point!
P

pschloss · November 30, 2021, 6:12pm

It sounds like they are asking for that. I’m with you in thinking that this is a big ask at this point in the field’s development. I think you have a couple of options…

Politely, tell them to go fly a kite… “Future investigations will be needed to validate these biomarkers and to determine whether they have a mechanistic role in O157 shedding”. I’d especially go this route for non-Nature/Science manuscripts
Point out that the taxa you found were important have been seen in similar roles in other studies - i.e., the results make sense.
Could you artificially create a held out dataset? You could take 20% of your samples hold them out, run the test on the other 80% and then see if the results on the 80% hold up for the 20%? You could then repeat that a bunch of times. This is called cross-validation and is often used in situations like this one.

Hope this helps a bit!
Pat

sapou · December 1, 2021, 7:29am

Yeap, nr 1 and nr 2 is what I thought as well. And yes it helps a lot! Thanks again, P.

system · December 11, 2021, 7:29am

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
classifying seqs, rdp vs silva Theory behind mothur	3	4257	November 9, 2015
Unclassifed Reads in Ocular Microbiome Theory behind mothur	3	1032	July 21, 2017
Proving there is a problem with the input Theory behind mothur	4	5613	February 3, 2012
Classify seqs Theory behind mothur	3	5157	September 10, 2014
Pulling out lineages and assigning OTUs Theory behind mothur	2	3190	January 22, 2014

Test differential representation results in an independent dataset?

Related topics