[Name] is already in your taxonomy file. Names must be unique

Hello!
While running classify.seqs i get the following error:
“[ERROR]: Actinobacteria; is already in your taxonomy file, names must be unique.
[ERROR]: Actinomycetales; is already in your taxonomy file, names must be unique.
[ERROR]: Actinomyces; is already in your taxonomy file, names must be unique.
[ERROR]: Actinobacteria; is already in your taxonomy file, names must be unique.
[ERROR]: Actinomycetales; is already in your taxonomy file, names must be unique.”

My classify.seqs code is below:

classify.seqs(seed=777, fasta=/scratch/bfoxman_fluxm/emenard/VIP/output/stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta, template=/scratch/bfoxman_fluxm/emenard/VIP/ref/oralDB_012814.fasta, taxonomy=/scratch/bfoxman_fluxm/emenard/VIP/ref/oralDB_012814.txt, method=wang, probs=T, cutoff=80)

What is going on?

Best,
Emily

I suspect that oralDB_012814.txt has spaces in it where they aren’t supposed to be. You should go back into it and make sure it follows the formatting that we use in the trainset taxonomy files.

Pat

@pschloss Hello Please do not think this is a dumb question. I am trying to do something similar using the classify.seqs command however I am new to Mothur and bioinformatics. I am stuck on what to put for taxonomy and template files. Am I supposed to make a template file and taxonomy file. I am trying to use the SILVA database. I have the newest SSU SILVA database provided at this link downloaded ARB files I have read the information in the following links but I am still not quite sure how to make these taxonomy and template files.

This is the command line I am using and the error message I am receiving if it helps.
mothur > classify.seqs(fasta=/home/cxlab/Downloads/R_2021_10_05_12_00_01_user_S5-70-10-05-2021_Chip.good.good.trim.fasta, method=wang, taxonomy=/home/cxlab/Downloads/silva.seed_v138_1.tgz)

Using 8 processors.
[ERROR]: The reference parameter is a required for the classify.seqs command.
[ERROR]: did not complete classify.seqs.

UPDATE: I tried the follow command below to see if this was how you are supposed to make a template file however I got the following errors

mothur > align.seqs(candidate=/home/cxlab/Downloads/Mothur.win/mothur/R_2021_10_05_12_00_01_user_S5-70-10-05-2021_Chip.good.good.trim(copy).fasta, template=/home/cxlab/Downloads/Mothur.win/mothur/silva.seed_v138_1(copy).tgz)

Using 8 processors.

Reading in the /home/cxlab/Downloads/Mothur.win/mothur/silva.seed_v138_1(copy).tgz template sequences… [WARNING]: We found more than 25% of the bases in sequence ��5`Խ�r㺒.|n{?E_v_��s�+ to be ambiguous. Mothur is not setup to process protein sequences.
[WARNING]: We found more than 25% of the bases in sequence ��oݕF��o�J��B�����ʦTftcǵ(�dں�|�’-����{�M�Q��"�w+M��f�e��0F to be ambiguous. Mothur is not setup to process protein sequences.
[ERROR]: template is not aligned, aborting.
DONE.
It took 0 to read 0 sequences.

It looks like you’re trying to give a tgz file to the template argument. Rather you need to decompress the tgz file and pull out the relevant files. You can see the proper syntax on the MiSeq SOP wiki page. You can get the pre generated reference files here:

Pat

@pschloss Thank you so much for your prompt response I unzipped the silva.seed_v138_1.tgz file and it contains two files silva.seed_v138_1(1)/silva.seed_v138_1.align and silva.seed_v138_1(1)/silva.seed_v138_1.tax. I am guessing the silva.seed_v138_1(1)/silva.seed_v138_1.align is the reference and the silva.seed_v138_1(1)/silva.seed_v138_1.align file is the taxonomy file created by your team utilizing the newest SILVA SSU database?

My initial multiplexed file was generated via Ion Torrent using the Ion GeneStudio S5 machine instead if MiSeq. Now I have some more questions. Am I supposed to generate a count table if so I am still confused on what file I should be using to do this? Or would me attempting to use the following commands on the de-multiplexed file I generated using Mothur on my first attempt work? Or am I missing a step?

mothur > classify.seqs(fasta=/home/cxlab/Downloads/R_2021_10_05_12_00_01_user_S5-70-10-05-2021_Chip.good.good.trim.fasta, method=wang, reference=/home/cxlab/Downloads/silva.seed_v138_1(1)/silva.seed_v138_1.align, taxonomy=/home/cxlab/Downloads/silva.seed_v138_1(1)/silva.seed_v138_1.tax)

Many thanks to you and your team providing this resource and for your help on this forum it is much appreciate. I look forward to hearing back from you.

Hi,

The count table is generated using the group and names files. The names file can be generated in unique.seqs. The group file will need to be generated using something like trim.seqs or using a script that you write.

Pat