Dear Mothur team,
I hope you all are health and safe!
I am following the recommendation from bioinformatics/mothur.fungal.batch at master · krmaas/bioinformatics · GitHub to run fungi dataset.
Few scripts above the “problem” arrives are:
->chimera.uchime(fasta=fungi.trim.contigs.good.unique.precluster.fasta, count=fungi.trim.contigs.good.unique.precluster.count_table, dereplicate=t)
->remove.seqs(accnos=current, fasta=current)
->classify.seqs(fasta=current, count=current, taxonomy=UNITEv8_sh_dynamic_s_all.tax, reference=UNITEv8_sh_dynamic_s_all.fasta, cutoff=60)
->remove.lineage(fasta=current, count=current, taxonomy=current, taxon=Protista-unknown-Metazoa-Viridiplantae-Stramenopila-Rhizaria)
Here I did not get any error message, so I am sure that the taxa were removed, just to reach the end of pipeline and discovered that there were kept…
I tried to upload the final.agc.0.03.pick.0.03.cons.tax.summary without success.
I ran the pipeline in Mac and in Linux v.1.44.3. The error occurred in both.
How can I fix it?
Huge thanks in advance!
Cris
1 Like
Could you run summary.seqs
after remove.seqs
and remove.lineage
and post the output? Also, could you post the first few lines of the *.taxonomy
file that is generated by classify.seqs
?
Thanks,
Pat
Hi Pat,
I am sorry for the late reply, I needed to re-run the pipeline.
Below is the summary.seqs after remove.seqs:
summary.seqs(fasta=current, count=current)
Using fungi.trim.contigs.good.unique.precluster.denovo.uchime.pick.count_table as input file for the count parameter.
Using fungi.trim.contigs.good.unique.precluster.pick.fasta as input file for the fasta parameter.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 190 190 0 3 1
2.5%-tile: 1 224 224 0 4 25530
25%-tile: 1 265 265 0 5 255294
Median: 1 293 293 0 5 510587
75%-tile: 1 314 314 0 6 765880
97.5%-tile: 1 382 382 0 7 995644
Maximum: 1 400 400 0 8 1021173
Mean: 1 293 293 0 5
# of unique seqs: 320843
total # of seqs: 1021173
It took 6 secs to summarize 1021173 sequences.
Output File Names:
fungi.trim.contigs.good.unique.precluster.pick.summary
Below is the summary.seqs after remove.lineages:
summary.seqs(fasta=current, count=current)
Using fungi.trim.contigs.good.unique.precluster.denovo.uchime.pick.pick.count_table as input file for the count parameter.
Using fungi.trim.contigs.good.unique.precluster.pick.pick.fasta as input file for the fasta parameter.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 190 190 0 3 1
2.5%-tile: 1 222 222 0 4 24173
25%-tile: 1 260 260 0 4 241729
Median: 1 293 293 0 5 483458
75%-tile: 1 310 310 0 6 725186
97.5%-tile: 1 382 382 0 7 942742
Maximum: 1 400 400 0 8 966914
Mean: 1 291 291 0 5
# of unique seqs: 291779
total # of seqs: 966914
It took 6 secs to summarize 966914 sequences.
Output File Names:
fungi.trim.contigs.good.unique.precluster.pick.pick.summary
Below is the ten first lines of the classify.seqs, where you can see that there is a porifera affiliation, which should have been removed in remove.lineages:
OTU Size Taxonomy
Otu00001 87182 k__Fungi(100);k__Fungi_unclassified(100);k__Fungi_unclassified(100);k__Fungi_unclassified(100);k__Fungi_unclassified(100);k__Fungi_unclassified(100);k__Fungi_unclassified(100);
Otu00002 55730 k__Fungi(100);k__Fungi_unclassified(100);k__Fungi_unclassified(100);k__Fungi_unclassified(100);k__Fungi_unclassified(100);k__Fungi_unclassified(100);k__Fungi_unclassified(100);
Otu00003 27179 k__Metazoa(100);p__Porifera(100);c__Demospongiae(100);o__Verongida(100);f__Aplysinidae(100);g__Aplysina(100);s__Aplysina_archeri(100);
Otu00004 25082 k__Fungi(100);p__Basidiomycota(100);c__Agaricomycetes(100);o__Agaricales(100);o__Agaricales_unclassified(87);o__Agaricales_unclassified(87);o__Agaricales_unclassified(87);
Otu00005 23906 k__Fungi(100);p__Basidiomycota(100);c__Tremellomycetes(100);o__Tremellales(100);f__Trimorphomycetaceae(100);g__Saitozyma(100);s__Saitozyma_podzolica(100);
Otu00006 23295 k__Fungi(100);p__Ascomycota(100);c__Dothideomycetes(100);o__Pleosporales(100);f__Pleosporales_fam_Incertae_sedis(100);g__Pseudorobillarda(100);s__Pseudorobillarda_sp(100);
Otu00007 19504 k__Fungi(100);p__Basidiomycota(100);c__Agaricomycetes(100);o__Hymenochaetales(100);f__Hymenochaetales_fam_Incertae_sedis(100);g__Resinicium(100);s__Resinicium_saccharicola(100);
Otu00008 16744 k__Fungi(100);p__Basidiomycota(100);c__Agaricomycetes(100);o__Polyporales(100);f__Xenasmataceae(100);g__Phlebiella(100);s__Phlebiella_borealis(100);
Otu00009 15053 k__Fungi(100);k__Fungi_unclassified(100);k__Fungi_unclassified(100);k__Fungi_unclassified(100);k__Fungi_unclassified(100);k__Fungi_unclassified(100);k__Fungi_unclassified(100);
Let me know if you need more information.
Huge thanks!!!
Cris
Thanks! Looking at the cons.taxonomy
file, I think you’re using the wrong names in remove.lineage
. You need to use the name as written in the taxonomy
file. For example, If you wanted to remove the Metazoa, you would need to say k__Metazoa
not Metazoa
.
Pat
You were right Pat.
I corrected the names that I wanted to remove in remove.lineages and it worked.
Huge thanks!
Cris