Testing for Batch effect


We are currently submitting a manuscript and one of the reviewers wants to know if there was a batch effect. To clarify, we gathered our data in two separate 454 sequencing runs and then combined the data. The reviewer suggested using the “Adonis” test. Do you have any recommendations on how to test for batch effect?


Adonis is a function in the R library ‘vegan’ that performs permutational multivariate ANOVA on a distance matrix. It basically fits a correlation between your data and pre-defined groupings to determine how well a particular grouping describes the variation in your data. As far as I can tell through a quick Google search there’s no actual stats test call ‘adonis’ so this is probably what they’re referring to.

It’s very simple to perform. You can import your distance matrix into R using this post and then the vegan documentation should help you through the rest of the analysis. If you’re not familiar with R then I know a few other programs can do the analysis (for example, PRIMER). You need to run the analysis once per grouping you want to test (run plate and whatever variables you were testing in your manuscript) but what you hope to see is that your study parameters provide a greater amount of explanatory power to your findings than the sequencing run, which will hopefully be negligible.

Thanks for the help…but I must admit, I’m not well versed in R but I will look into it as well as the PRIMER software. I’ve read elsewhere on the forum (http://mothur.org/forum/viewtopic.php?f=3&t=2957) that adonis is basically the same as AMOVA…would there be anyway to test for batch effect using the amova command in mothur?

O yea, there’s a command to do that in mothur - http://www.mothur.org/wiki/Amova. You could just group your data by plate and see what the result is.

It’s a bit of an odd comment to receive, did you have all your pre-/post-treatment samples split into separate runs or were the samples randomised across the machine runs?

The samples were randomized across both runs…it is a bit of an odd comment but we would like to make an effort to address their question.

Any recommendations on how to do this? I assume I need to create a distance matrix but I’m not sure which command to use and how to group the data.


These are pretty common for most of the statistical tests done in mothur.

Thanks for you help so far! I do know how to make design files, I’m just unclear about which calculator to use to make the distance file? I’m assuming I should use dist.seqs for my fasta file containing all of my reads (from both runs), and then use the amova command to test for differences between the two sequencing runs…my question is how do I make a design file to do this without having to individually label each read with the sequencing run it was generated by?

I don’t know if you’d need to do that, I would think you could just get away with repeating your per-group analysis with new sample groupings (which is how I would do it). Hopefully someone else can weigh in on this though.

If you want a sequence-based design file for analysis, I’m not sure how well this will play out overall since there’s already going to be a large amount if within-sample variation just due to the origin of each sequence. Probably the simplest way to do this would be to use the merge.groups command, to regroup your samples into just the design file groupings. So you’d have to manually set the sequencing run of each sample group, but then could automatically apply the sequencing run to each sequence.

Running amova should get you what you want. Like the others, I suspect you won’t see anything if you have randomized the samples between runs. - good thinking!


Thanks every one for you help!