classify.seqs possible bug

I’m trying to run classify using the fungal Unite ITS database of around 200k sequences. I get slightly puzzling log output (below) and the command doesn’t complete properly (doesn’t output the final taxonomy files, and the .prob file doesn’t look right, having only one entry). I’ve tried it also without the .names file (on the original non-uniqified sequence set) but same result. I checked the taxonomy file format and the correspondence between it and the template file (which is just a fasta file of unaligned sequences). All seems to be well. My taxonomies are e.g. UDB000009 fungi;basidiomycota;agaricomycotina;agaricomycetes;

I’m running as the only process on a machine with 12 cpus and 48G RAM.
The log file:

Running 64Bit Version

mothur v.1.21.1
Last updated: 8/11/2011

by
Patrick D. Schloss

Department of Microbiology & Immunology
University of Michigan
pschloss@umich.edu
http://www.mothur.org

When using, please cite:
Schloss, P.D., et al., Introducing mothur: Open-source, platform-independent, community-supported software for describ
ing and comparing microbial communities. Appl Environ Microbiol, 2009. 75(23):7537-41.

Distributed under the GNU General Public License

Type ‘help()’ for information on the commands that are available

Type ‘quit()’ to exit program
Interactive Mode


mothur > classify.seqs(fasta=all.unique.seq,name=all.names,cutoff=60,template=Unite.format.seq,taxonomy=Unite.taxonomy ,group=groups,processors=10)

Reading in the Unite.taxonomy taxonomy… DONE.
Generating search database… is in your template file and is not in your taxonomy file. Please correct.
DONE.
It took 1670 seconds generate search database.
DONE.
It took 1671 seconds get probabilities.

By the way I would mention also the forum issue that I mentioned before - namely when you search for a command, say classify.seqs, it blocks it saying the search phrase is too common. It seems wrong because the commands are exactly the search words I seem to want to use most often.

It looks like mothur is giving you an error message about your template and taxonomy files not matching. “is in your template file and is not in your taxonomy file. Please correct.” I suspect it’s a file format issue. If you want to send your template and taxonomy files to mothur.bugs@gmail.com, I can try to spot the error for you.

The problem is in the fasta file. Mothur does not allow for comment lines in a fasta file. If you remove the first 4 lines you should be all set.

Sorry, obvious user error. Although the log file could be a bit clearer on the problem :smiley:

[quote=“wmnwmn”]
I’m trying to run classify using the fungal Unite ITS database of around 200k sequences. I get slightly puzzling log output (below) and the command doesn’t complete properly (doesn’t output the final taxonomy files, and the .prob file doesn’t look right, having only one entry). I’ve tried it also without the .names file (on the original non-uniqified sequence set) but same result. I checked the taxonomy file format and the correspondence between it and the template file (which is just a fasta file of unaligned sequences). All seems to be well. My taxonomies are e.g. UDB000009 fungi;basidiomycota;agaricomycotina;agaricomycetes;

Dear wmnwmn,

Im also trying to run Fungi sequences from ion torrent data through mothur for the ITS region, although I haven’t got as far as you have. I have hit a problem in align.seqs(fasta=myfile.shhh.trim.unique.fasta, reference=mysteryfile, processors=2), I do not have a fungal reference fasta file. Could you (or anyone else) be so kind to tell me how you got around this issue (did you make a reference file or is there one available?). I have tried all the files on the UNITE web site, and in mothur a error just pops up saying the file is not aligned…

Any help would be greatly appreciated, no one else is using MOTHUR in my research group.

Kind regards,

Bede

University of Western Australia PhD candidate