Good day, everyone! I’d like to ask some clarification with regard to the MiSeq Mothur pipeline in Galaxy (I am following the tutorial guide from this link: 16S Microbial Analysis with mothur (extended)).
As stated in the question, I’m a bit confused on how Remove.groups output from Mock Community removal (step prior to OTU Clustering) is different from the Remove.lineage output (which is the data prior to the introduction of mock community data).
I understand that Remove.group is done after performing “Calculate error rates based on our mock community” step (hereby referred as Error Rate Calculation) to get rid of the mock community since this is not needed for further analysis (in this case, for the OTU Clustering). However, prior to the Error Rate Calculation step, there’s Remove.lineage output that is free from mock community data at all.
I skipped the Error Rate calculation step which means I will not have the Remove.groups output that is obtained from removing mock community from the data set. Opening Cluster.split, only the Remove.lineage outputs are the only obvious available data for execution. Which makes me wonder because regardless of whether or not I perform an Error Rate Calculation, I still have the Remove.lineage data which is has no mock community data at all. I wonder why is Remove.group necessary to execute when there’s a Remove.lineage output w/o the mock community data.
How are they different? If I didn’t perform Error Rate calculation step (therefore no Remove.group output since I did not use any mock community at all for me to remove), can I just proceed with Remove.lineage data as input data?
I hope I was able to express my question clearly. Thanks for taking time to read this.