Hi! How can I remove all OTUs present in my negative control samples, from all sample groups (preferably: if present in ALL of the several negative controls; perhaps combined with some kind of relative abundance threshold)?
Or, if you have more clever suggestions on how to try to remove the OTUs originating from reagent etc. contamination, I would highly appreciate any comments. I have some samples with very low microbial abundance, and I expect a major portion of the OTUs to originate from the reagents. I do have also the 16S qPCR data from these samples, so I could in principle play around with absolute abundances.
To remove OTUs you can use the remove.otus command, providing it with an accnos file that contains the OTUs you wish to have removed.
That said, I am not sure that wholesale removal of OTUs that are in your negative controls is the correct thing to do. Removal of OTUs should depend on the actual amount that that OTU is present in your samples, and what it is classified as. For example, I often find that I get pseudomonas contamination in all my negative controls, however I also get it in some of my samples. For the region that I am sequencing (V4 of the 16S) there is a high chance that although these pseudomonads are represented by a single OTU, they are likely not the same organism. Ultimately this OTU only appears in some of my actual samples (where I would expect it to be), and not in other samples (where I do not expect it to be), and only ever at low abundances, so I choose to leave it in but am aware of it when drawing conclusions about community structure.
How much different are the sequence read counts for your negative controls and actual samples? If they are too close then perhaps the question isn’t how do I remove these OTUs after sequencing, but rather how do I do a better job of eliminating contamination which is hampering my downstream analysis.
Thanks a lot for the comments Richard. These specific samples are quite challenging technically due to very low microbial numbers. I think I will try a slightly more advanced filtering process rather than wholesale removal of the negative control OTUs, taking into account read counts from individual samples and 16S qPCR data.
I’m leaning towards removal of contaminant sequences at the pre.cluster stage rather than after OTUs are made. But haven’t actually tested that out yet. To date I’ve been able to repeat sample extraction and library prep for samples that have been contaminated.
That sounds like a clever way to do it. May I ask if you have a protocol how to perform the filtering at the precluster stage? Would it be possible to process the data in Excel etc at this stage (like we can easily do at the OTU stage)? I mean, if the complexity of the criteria exceed my limited coding skills (like, removing sequences which are present in X controls at relative abundance of Y, etc.). Is it possible to generate a human-readable table like an OTU table for the preclustered sequences and then put the processed dataset back into the mothur workflow (or perhaps use it to generate an accnos file for removing some of the sequences)?