Removing (contaminating) sequences AFTER OTU clustering

Is there any way to remove selected sequences (not OTUs) from the counts after OTU clustering?

I want to remove potential contaminant sequences from some but not all samples (because I have samples with very low and very high biomass, and only the former are affected by reagent contamination). I want to do this at the sequence level because the OTU level is too rough. So I have now done this after chimera removal but before OTU clustering. But it seems dist.seqs and cluster produce slightly different results when fed with the cleaned sequence set (I mean, not only the removed sequences are away, but OTUs are clustered slightly differently). So I can’t really directly compare the two differently treated datasets after that.

So I would like to do this:

  • Take the sequence-based OTU count table produced from the full dataset
  • From these counts, subtract how much represented the contaminant sequences in each OTU in each sample

If I were to remove sequences based on contaminants from a negative control sample, I would do it on a sequence basis, not an OTU basis. There isn’t a function within mothur to do this for you so you would have to write your own R or python script to do it.

Pat