What's the best experimental design for targeted metagenomics?

brick1233 · October 18, 2025, 6:34pm

Hey, anyone.

I’m curious what you think the best possible experimental design is for an amplicon based study. Mock communities are great (not that I’ve yet had the pleasure of using one) but I’m wondering if spike-ins could also improve the data’s reliability?

Just wondering what your “ideal” experiment looks like, especially considering the idea of combining independent experiments together.

Thanks!

pschloss · October 20, 2025, 1:16pm

Hey -

Spike ins are an interesting idea. The challenge is what are you going to spike in - cells or DNA? If cells then you’re assuming that they will attach to the environmental matrix and lyse as efficient as if they were natively grown there. If DNA, then you lose the ability to look at extraction efficiency. Ultimately, if you look over these problems, they’re best for quantification. You can get much better quantification doing qPCR with universal primers. If your question is about comparing your data to someone else’s I don’t think spike ins will help.

For comparison of independent experiments, I would strongly encourage you to interpret your results independently and then make comparisons. You cannot use someone else’s data as your control since the data were generated by different methods, by different people, with different equipment. If the signal is strong enough to be real, you should see it across studies. We’ve done work like what I suggest in a study on obesity and colon cancer.

Hope this helps,
Pat

brick1233 · October 30, 2025, 8:57pm

Hey, Pat

Thanks for the considerations, I’m designing an study that will likely involve multiple experiments. Do you think it’s possible to ensure each experiment’s results are directly comparable to each other? Even if the samples were taken at different times/in different experiments, can’t they still be combined if they were all sequenced together?
Additionally, does anyone think HarmonizR* would be a viable batch correction tool ie. representative OTU sequences are used as features and their abundances are the values. I mention HarmonizR because it was designed with the expectation of missingness (ie. some representative sequences may be missing between samples). In this context, a mock community could be used as a reference batch. IDK, any thoughts on this would be appreciated.

*: HarmonizR basically applies sva::ComBat or limma::removeBatchEffect() for each feature/representative_seq

Topic		Replies	Views
Experimental Design & Mock Communities	2	687	February 9, 2021
Comparison studies on amplicon sequence data Commands in mothur	3	653	February 1, 2020
Compare Samples With Sequences of Different Lengths? Commands in mothur	4	53	September 4, 2025
How good is good enough (analysis of mock community)? Commands in mothur	5	4487	July 28, 2015
Mock community Theory behind mothur	2	4725	October 11, 2013

What's the best experimental design for targeted metagenomics?

Related topics