How to create a taxonomy file?

I’m following the 454 SOP but I get stuck at some steps since my data doesn’t have taxonomy files. How does one go about creating taxonomy files? My data has fasta, names and group files.

The classify.seqs command, http://www.mothur.org/wiki/Classify.seqs, will create a taxonomy file. You may also find this link helpful, http://www.mothur.org/wiki/Taxonomy_outline.

Great thanks, I had a look but I’m still having some trouble. What can we use as a reference sequence? Can we use silva.nr_v119.align file as the database file and silva.nr_v119.tax as the reference file?

You should be able to. The classify.seqs command does not require an aligned reference, so you may want to run degap.seqs on the reference file first to save memory.

mothur > degaps.seqs(fasta=silva.nr_v119.align)

So I did the following:

degap.seqs(fasta=silva.nr_v119.align)

Then I used the output in:

classify.seqs(fasta=cdical.pick.fasta, template=silva.nr_v119.tax, taxonomy=silva.nr_v119.ng.fasta)

Right at the beginning I had some sort of warning about mothur not being set up to analyze proteins, I don’t remember the exact phrasing then 70000 seconds later:

[ERROR]: >JQ360191.BctEn672 is missing the final ‘;’, ignoring

As well as:

‘J440719.PutAgent’ is in your template file and is not in your taxonomy file. Please correct.

And also:

[ERROR]: You are missing (
Invalid.

Could somebody please give some hints to what I’m doing wrong?

It sounds like the read in classify.seqs got off track. Could you try it again with processors=1? Also, if you set the debug flag, mothur will let you know what its reading.

mothur > set.dir(debug=t)

Thank you for all your help, still getting some errors though. This time I put in the extra processors=1 statement:

classify.seqs(fasta=cdical.pick.fasta, template=silva.nr_v119.tax, taxonomy=silva.nr_v119.ng.fasta, processors=1)

Using 1 processors.
Generating search database… [WARNING]: We found more than 25% of the bases in sequence J440719.PutAgent to be ambiguous. Mothur is not setup to process protein sequences.
DONE.
It took 0 seconds generate search database.
unknown unknown

Reading in the silva.nr_v119.ng.fasta taxonomy… [DEBUG]: Taxonomies read in…
[ERROR]: >AJ440719.PutAgent is missing the final ‘;’, ignoring.
[DEBUG]: name = ‘>AJ440719.PutAgent’ tax = 'GCGAACGCTGGCGG…

And so on, this repeats for quite a while and then at the end:

[DEBUG]: numSeqs saved = ‘0’
0 unknown
maxLevel = 1
unclassified
unclassified
DONE.
[DEBUG]: in error check. Numseqs in template = 2. Numseqs in taxonomy = 1.
‘J440719.PutAgent’ is in your template file and is not in your taxonomy file. Please correct.
[DEBUG]: about to generateWordPairDiffArr
[DEBUG]: done generateWordPairDiffArr
DONE.
It took 152 seconds get probabilities.

Any ideas? Should I try a different version of Mothur? I’m using the 4/4/2014 release

1 Like

Can you try it with our current version? Also, can you redownload the reference files, http://www.mothur.org/wiki/Silva_reference_alignment?

I’ve tried it with the latest mothur version and have redownloaded the Silva database but I’m getting the same errors. Do you have any other suggestions I could try?

You have:

classify.seqs(fasta=cdical.pick.fasta, template=silva.nr_v119.tax, taxonomy=silva.nr_v119.ng.fasta, processors=1)

you want…

classify.seqs(fasta=cdical.pick.fasta, taxonomy=silva.nr_v119.tax, template=silva.nr_v119.ng.fasta, processors=1)