Why is performing remove.groups (to remove mock community) necessary before doing Cluster.split? (Mothur MiSeq SOP tutorial)

Good day, everyone! I’d like to ask some clarification with regard to the MiSeq Mothur pipeline in Galaxy (I am following the tutorial guide from this link: 16S Microbial Analysis with mothur (extended)).

As stated in the question, I’m a bit confused on how Remove.groups output from Mock Community removal (step prior to OTU Clustering) is different from the Remove.lineage output (which is the data prior to the introduction of mock community data).

I understand that is done after performing “Calculate error rates based on our mock community” step (hereby referred as Error Rate Calculation) to get rid of the mock community since this is not needed for further analysis (in this case, for the OTU Clustering). However, prior to the Error Rate Calculation step, there’s Remove.lineage output that is free from mock community data at all.

I skipped the Error Rate calculation step which means I will not have the Remove.groups output that is obtained from removing mock community from the data set. Opening Cluster.split, only the Remove.lineage outputs are the only obvious available data for execution. Which makes me wonder because regardless of whether or not I perform an Error Rate Calculation, I still have the Remove.lineage data which is has no mock community data at all. I wonder why is necessary to execute when there’s a Remove.lineage output w/o the mock community data.

How are they different? If I didn’t perform Error Rate calculation step (therefore no output since I did not use any mock community at all for me to remove), can I just proceed with Remove.lineage data as input data?

I hope I was able to express my question clearly. Thanks for taking time to read this.

Hi there,

The remove.groups command will remove a sample (e.g. MOCK) where as remove.lineage will remove a taxonomic grouping (e.g. Archaea).

If you don’t have a mock community or any other samples to remove, then you don’t need to bother with remove.groups or get.groups. But you will probably still want to use remove.lineage.



I would say you don’t “have to” remove your positive control with if you plan on using it in your downstream analysis. Like prof Schloss said, you definitely need to remove useless sequences with remove.lineage. The less useless sequences or errors, the better for the distance matrix formation and the following clustering into OTU.

To make sure that my students do not get things mix up, I alway make them look at the error rate at the end of the pipeline.

Hope it helps, and alway use controls.

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.