I am working with prokaryotic sequences from the marine environment that contain a large amount of chloroplast and mitochondria classification hits.
I am trying to determine a way to remove cyanobacterial sequences that match to chloroplast/plastids (using the SINA ref database) at higher taxonomic levels without removing all reads that classify as cyanobacteria because cyanobacteria are important microbial community members.
Any suggestions on how I would be able to accomplish this?
In the SILVA reference the taxonomy string for chloroplasts is “Bacteria;Cyanobacteria;Chloroplast;”. So if you just use taxon=Chloroplast you should be fine.
I just ran this over the weekend
remove.lineage(fasta=current, count=current, taxonomy=current, taxon=Bacteria;Cyanobacteria_Chloroplast;Chloroplast-Mitochondria-unknown-Eukaryota) in my batch file but my final taxonomy file still included chloroplasts. Am I missing something obvious? This is a script we’ve been using a long time so I’m confused as to why I’m suddenly seeing chloroplasts again.
EDIT: So it appears that it IS obvious. We were seeing chloroplasts SPECIFICALLY in the cyanobacteria_chloroplast hybrid phylum before, so that’s why our code was written the way it was. I mistakenly thought I was also following Pat’s advice and selecting for chloroplasts in general, but in my code, you can see that the only chloroplasts I was removing were in that “Class.” Once I added an additional bit to include chloroplasts in general, it was fixed.
Hi,
if I am not mistaken then you can make your life even more simple when you want to get rid of chloroplasts (or anything else). You don´t have to give the full taxonomy when using the remove.lineage
command: Using just “…taxon=Chloroplast…” will get rid of every sequence that contains the word “Chloroplast” anywhere in its taxonomy.