Problem with remove.lineage in fungi pipeline

Dear Mothur team,
I hope you all are health and safe!
I am following the recommendation from bioinformatics/mothur.fungal.batch at master · krmaas/bioinformatics · GitHub to run fungi dataset.
Few scripts above the “problem” arrives are:
->chimera.uchime(fasta=fungi.trim.contigs.good.unique.precluster.fasta, count=fungi.trim.contigs.good.unique.precluster.count_table, dereplicate=t)
->remove.seqs(accnos=current, fasta=current)
->classify.seqs(fasta=current, count=current, taxonomy=UNITEv8_sh_dynamic_s_all.tax, reference=UNITEv8_sh_dynamic_s_all.fasta, cutoff=60)
->remove.lineage(fasta=current, count=current, taxonomy=current, taxon=Protista-unknown-Metazoa-Viridiplantae-Stramenopila-Rhizaria)
Here I did not get any error message, so I am sure that the taxa were removed, just to reach the end of pipeline and discovered that there were kept…
I tried to upload the final.agc.0.03.pick.0.03.cons.tax.summary without success.

I ran the pipeline in Mac and in Linux v.1.44.3. The error occurred in both.
How can I fix it?
Huge thanks in advance!
Cris

1 Like

Could you run summary.seqs after remove.seqs and remove.lineage and post the output? Also, could you post the first few lines of the *.taxonomy file that is generated by classify.seqs?

Thanks,
Pat

Hi Pat,
I am sorry for the late reply, I needed to re-run the pipeline.

Below is the summary.seqs after remove.seqs:

summary.seqs(fasta=current, count=current)
Using fungi.trim.contigs.good.unique.precluster.denovo.uchime.pick.count_table as input file for the count parameter.
Using fungi.trim.contigs.good.unique.precluster.pick.fasta as input file for the fasta parameter.

		Start	End	NBases	Ambigs	Polymer	NumSeqs
Minimum:	1	190	190	0	3	1
2.5%-tile:	1	224	224	0	4	25530
25%-tile:	1	265	265	0	5	255294
Median: 	1	293	293	0	5	510587
75%-tile:	1	314	314	0	6	765880
97.5%-tile:	1	382	382	0	7	995644
Maximum:	1	400	400	0	8	1021173
Mean:	1	293	293	0	5
# of unique seqs:	320843
total # of seqs:	1021173

It took 6 secs to summarize 1021173 sequences.
Output File Names:
fungi.trim.contigs.good.unique.precluster.pick.summary

Below is the summary.seqs after remove.lineages:

summary.seqs(fasta=current, count=current)
Using fungi.trim.contigs.good.unique.precluster.denovo.uchime.pick.pick.count_table as input file for the count parameter.
Using fungi.trim.contigs.good.unique.precluster.pick.pick.fasta as input file for the fasta parameter.

		Start	End	NBases	Ambigs	Polymer	NumSeqs
Minimum:	1	190	190	0	3	1
2.5%-tile:	1	222	222	0	4	24173
25%-tile:	1	260	260	0	4	241729
Median: 	1	293	293	0	5	483458
75%-tile:	1	310	310	0	6	725186
97.5%-tile:	1	382	382	0	7	942742
Maximum:	1	400	400	0	8	966914
Mean:	1	291	291	0	5
# of unique seqs:	291779
total # of seqs:	966914

It took 6 secs to summarize 966914 sequences.
Output File Names:
fungi.trim.contigs.good.unique.precluster.pick.pick.summary

Below is the ten first lines of the classify.seqs, where you can see that there is a porifera affiliation, which should have been removed in remove.lineages:

OTU	Size	Taxonomy
Otu00001	87182	k__Fungi(100);k__Fungi_unclassified(100);k__Fungi_unclassified(100);k__Fungi_unclassified(100);k__Fungi_unclassified(100);k__Fungi_unclassified(100);k__Fungi_unclassified(100);
Otu00002	55730	k__Fungi(100);k__Fungi_unclassified(100);k__Fungi_unclassified(100);k__Fungi_unclassified(100);k__Fungi_unclassified(100);k__Fungi_unclassified(100);k__Fungi_unclassified(100);
Otu00003	27179	k__Metazoa(100);p__Porifera(100);c__Demospongiae(100);o__Verongida(100);f__Aplysinidae(100);g__Aplysina(100);s__Aplysina_archeri(100);
Otu00004	25082	k__Fungi(100);p__Basidiomycota(100);c__Agaricomycetes(100);o__Agaricales(100);o__Agaricales_unclassified(87);o__Agaricales_unclassified(87);o__Agaricales_unclassified(87);
Otu00005	23906	k__Fungi(100);p__Basidiomycota(100);c__Tremellomycetes(100);o__Tremellales(100);f__Trimorphomycetaceae(100);g__Saitozyma(100);s__Saitozyma_podzolica(100);
Otu00006	23295	k__Fungi(100);p__Ascomycota(100);c__Dothideomycetes(100);o__Pleosporales(100);f__Pleosporales_fam_Incertae_sedis(100);g__Pseudorobillarda(100);s__Pseudorobillarda_sp(100);
Otu00007	19504	k__Fungi(100);p__Basidiomycota(100);c__Agaricomycetes(100);o__Hymenochaetales(100);f__Hymenochaetales_fam_Incertae_sedis(100);g__Resinicium(100);s__Resinicium_saccharicola(100);
Otu00008	16744	k__Fungi(100);p__Basidiomycota(100);c__Agaricomycetes(100);o__Polyporales(100);f__Xenasmataceae(100);g__Phlebiella(100);s__Phlebiella_borealis(100);
Otu00009	15053	k__Fungi(100);k__Fungi_unclassified(100);k__Fungi_unclassified(100);k__Fungi_unclassified(100);k__Fungi_unclassified(100);k__Fungi_unclassified(100);k__Fungi_unclassified(100);

Let me know if you need more information.
Huge thanks!!!
Cris

Thanks! Looking at the cons.taxonomy file, I think you’re using the wrong names in remove.lineage. You need to use the name as written in the taxonomy file. For example, If you wanted to remove the Metazoa, you would need to say k__Metazoa not Metazoa.

Pat

You were right Pat.
I corrected the names that I wanted to remove in remove.lineages and it worked.
Huge thanks!
Cris