Remove.lineages; Removing Cyanobacteria->chloroplast sequences


Dear all,

I am working with prokaryotic sequences from the marine environment that contain a large amount of chloroplast and mitochondria classification hits.

I am trying to determine a way to remove cyanobacterial sequences that match to chloroplast/plastids (using the SINA ref database) at higher taxonomic levels without removing all reads that classify as cyanobacteria because cyanobacteria are important microbial community members.

Any suggestions on how I would be able to accomplish this?


In the SILVA reference the taxonomy string for chloroplasts is “Bacteria;Cyanobacteria;Chloroplast;”. So if you just use taxon=Chloroplast you should be fine.



Hi Pat.

I just ran this over the weekend
remove.lineage(fasta=current, count=current, taxonomy=current, taxon=Bacteria;Cyanobacteria_Chloroplast;Chloroplast-Mitochondria-unknown-Eukaryota) in my batch file but my final taxonomy file still included chloroplasts. Am I missing something obvious? This is a script we’ve been using a long time so I’m confused as to why I’m suddenly seeing chloroplasts again.

EDIT: So it appears that it IS obvious. We were seeing chloroplasts SPECIFICALLY in the cyanobacteria_chloroplast hybrid phylum before, so that’s why our code was written the way it was. I mistakenly thought I was also following Pat’s advice and selecting for chloroplasts in general, but in my code, you can see that the only chloroplasts I was removing were in that “Class.” Once I added an additional bit to include chloroplasts in general, it was fixed.

remove.lineage(fasta=current, count=current, taxonomy=current, taxon=Bacteria;Cyanobacteria_Chloroplast;Chloroplast-Chloroplast-Mitochondria-unknown-Eukaryota