Dear Mothur team,
I hope you all are health and safe!
I am following the recommendation from bioinformatics/mothur.fungal.batch at master · krmaas/bioinformatics · GitHub to run fungi dataset.
Few scripts above the “problem” arrives are:
->chimera.uchime(fasta=fungi.trim.contigs.good.unique.precluster.fasta, count=fungi.trim.contigs.good.unique.precluster.count_table, dereplicate=t)
->remove.seqs(accnos=current, fasta=current)
->classify.seqs(fasta=current, count=current, taxonomy=UNITEv8_sh_dynamic_s_all.tax, reference=UNITEv8_sh_dynamic_s_all.fasta, cutoff=60)
->remove.lineage(fasta=current, count=current, taxonomy=current, taxon=Protista-unknown-Metazoa-Viridiplantae-Stramenopila-Rhizaria)
Here I did not get any error message, so I am sure that the taxa were removed, just to reach the end of pipeline and discovered that there were kept…
I tried to upload the final.agc.0.03.pick.0.03.cons.tax.summary without success.
I ran the pipeline in Mac and in Linux v.1.44.3. The error occurred in both.
How can I fix it?
Huge thanks in advance!
Cris
1 Like
Could you run summary.seqs after remove.seqs and remove.lineage and post the output? Also, could you post the first few lines of the *.taxonomy file that is generated by classify.seqs?
Thanks,
Pat
Hi Pat,
I am sorry for the late reply, I needed to re-run the pipeline.
Below is the summary.seqs after remove.seqs:
summary.seqs(fasta=current, count=current)
Using fungi.trim.contigs.good.unique.precluster.denovo.uchime.pick.count_table as input file for the count parameter.
Using fungi.trim.contigs.good.unique.precluster.pick.fasta as input file for the fasta parameter.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 190 190 0 3 1
2.5%-tile: 1 224 224 0 4 25530
25%-tile: 1 265 265 0 5 255294
Median: 1 293 293 0 5 510587
75%-tile: 1 314 314 0 6 765880
97.5%-tile: 1 382 382 0 7 995644
Maximum: 1 400 400 0 8 1021173
Mean: 1 293 293 0 5
# of unique seqs: 320843
total # of seqs: 1021173
It took 6 secs to summarize 1021173 sequences.
Output File Names:
fungi.trim.contigs.good.unique.precluster.pick.summary
Below is the summary.seqs after remove.lineages:
summary.seqs(fasta=current, count=current)
Using fungi.trim.contigs.good.unique.precluster.denovo.uchime.pick.pick.count_table as input file for the count parameter.
Using fungi.trim.contigs.good.unique.precluster.pick.pick.fasta as input file for the fasta parameter.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 190 190 0 3 1
2.5%-tile: 1 222 222 0 4 24173
25%-tile: 1 260 260 0 4 241729
Median: 1 293 293 0 5 483458
75%-tile: 1 310 310 0 6 725186
97.5%-tile: 1 382 382 0 7 942742
Maximum: 1 400 400 0 8 966914
Mean: 1 291 291 0 5
# of unique seqs: 291779
total # of seqs: 966914
It took 6 secs to summarize 966914 sequences.
Output File Names:
fungi.trim.contigs.good.unique.precluster.pick.pick.summary
Below is the ten first lines of the classify.seqs, where you can see that there is a porifera affiliation, which should have been removed in remove.lineages:
OTU Size Taxonomy
Otu00001 87182 k__Fungi(100);k__Fungi_unclassified(100);k__Fungi_unclassified(100);k__Fungi_unclassified(100);k__Fungi_unclassified(100);k__Fungi_unclassified(100);k__Fungi_unclassified(100);
Otu00002 55730 k__Fungi(100);k__Fungi_unclassified(100);k__Fungi_unclassified(100);k__Fungi_unclassified(100);k__Fungi_unclassified(100);k__Fungi_unclassified(100);k__Fungi_unclassified(100);
Otu00003 27179 k__Metazoa(100);p__Porifera(100);c__Demospongiae(100);o__Verongida(100);f__Aplysinidae(100);g__Aplysina(100);s__Aplysina_archeri(100);
Otu00004 25082 k__Fungi(100);p__Basidiomycota(100);c__Agaricomycetes(100);o__Agaricales(100);o__Agaricales_unclassified(87);o__Agaricales_unclassified(87);o__Agaricales_unclassified(87);
Otu00005 23906 k__Fungi(100);p__Basidiomycota(100);c__Tremellomycetes(100);o__Tremellales(100);f__Trimorphomycetaceae(100);g__Saitozyma(100);s__Saitozyma_podzolica(100);
Otu00006 23295 k__Fungi(100);p__Ascomycota(100);c__Dothideomycetes(100);o__Pleosporales(100);f__Pleosporales_fam_Incertae_sedis(100);g__Pseudorobillarda(100);s__Pseudorobillarda_sp(100);
Otu00007 19504 k__Fungi(100);p__Basidiomycota(100);c__Agaricomycetes(100);o__Hymenochaetales(100);f__Hymenochaetales_fam_Incertae_sedis(100);g__Resinicium(100);s__Resinicium_saccharicola(100);
Otu00008 16744 k__Fungi(100);p__Basidiomycota(100);c__Agaricomycetes(100);o__Polyporales(100);f__Xenasmataceae(100);g__Phlebiella(100);s__Phlebiella_borealis(100);
Otu00009 15053 k__Fungi(100);k__Fungi_unclassified(100);k__Fungi_unclassified(100);k__Fungi_unclassified(100);k__Fungi_unclassified(100);k__Fungi_unclassified(100);k__Fungi_unclassified(100);
Let me know if you need more information.
Huge thanks!!!
Cris
Thanks! Looking at the cons.taxonomy file, I think you’re using the wrong names in remove.lineage. You need to use the name as written in the taxonomy file. For example, If you wanted to remove the Metazoa, you would need to say k__Metazoa not Metazoa.
Pat
You were right Pat.
I corrected the names that I wanted to remove in remove.lineages and it worked.
Huge thanks!
Cris