Remove human contamination


I’m not sure if there’s a command available to directly remove reads suspected of being human from a data set, something similar to BMTagger. If not, would a reasonable approach be to use the latest taxonomy that includes Mitochondria to classify reads and then simply remove those that are not classified as bacteria or archaea? I think I’d still have to compare any unclassified reads against a human database of some kind.


The pds version of the RDP taxonomy contains mitochondria and chloroplasts ( We have also found that if sequences classify poorly at the kingdom level (i.e not 100%), they are generally human/mouse (18S).

Thanks. So the strategy should be to filter out all mitochondrial/chloroplast sequences and throw out anything that doesn’t classify as 100% bacteria or archaea. That’s what I’ll do.