Remove.lineage removes from tax file not fasta

Hi Everyone,

I’ve been trying to take a stab at generating my own 18s rDNA taxonomy files and fasta files taken directly from SILVA. The v123 and v119 versions of the mothur formatted taxonomy files have a similar problem of the taxonomic strings being hard cut at 6 levels. This isn’t a problem for the bacterial sequences but for eukaryotic sequences there are many “inbetween” subcategories such as “superorders” or “subfamilies” in the taxonomny file. Due to the necessary mothur formatting a lot of the eukaryotic sequences are stopped early in their classification. Hopefully I can change that.

At any rate, right now I’m at the step of trying to eliminate all bacterial sequences from the files by using remove.lineage as seen below:

remove.lineage(fasta=18S_V123_SILVA_EDITS.fasta, taxonomy=18S_V123_SILVA_REFINED.taxonomy, taxon=Bacteria;)

After running the command the taxonomy file is edited as expected with no bacterial taxon strings but the fasta file appears to be untouched. As far as I can tell none of the bacterial sequences have been removed from the fasta file. Although, curiously the size of the fasta file has decreased ever so slightly (From 4.54GB to 4.49GB).

I just wanted to troubleshoot and see if I’m entering the command wrong and it isn’t finishing because of that reason or if there is something else going on.

Thank you for all of your help,

  • Jake

Hmm… That seems odd. Are there duplicate sequences names in the fasta or taxonomy files? Have you tried the following?

mothur > remove.lineage(fasta=18S_V123_SILVA_EDITS.fasta, taxonomy=18S_V123_SILVA_REFINED.taxonomy, taxon=Bacteria;)
mothur > list.seqs(taxonomy=current) - list sequences in your filtered taxonomy file
mothur > get.seqs(fasta=current) - select those sequences from your fasta file