classify.seqs taxonomy file error

Hello!

It is probably a newby error but I have problems running classify.seqs with the greengenes database.

I am using the latest mothur Gui on Windows 7.

For reference, I am using the fasta file from greengenes second website (can’t fint it on mothur wiki…) and the taonomy file present on the mothur wiki.

Here is my output.

mothur > classify.seqs(fasta=C:\Users\Alexandre et Sophie\Documents\Programme\mothur\mothurGUI\thibodeau_pouletjanv2013.trim.fasta, group=C:\Users\Alexandre et Sophie\Documents\Programme\mothur\mothurGUI\thibodeau_pouletjanv2013.groups, reference=C:\Users\Alexandre et Sophie\Documents\Programme\mothur\Greengenes\gg_13_5.fasta, taxonomy=C:\Users\Alexandre et Sophie\Documents\Programme\mothur\Greengenes\Gg_13_5_99.taxonomy, cutoff=51)

Using 1 processors.
Generating search database… DONE.
It took 1443 seconds generate search database.

Reading in the C:\Users\Alexandre et Sophie\Documents\Programme\mothur\Greengenes\Gg_13_5_99.taxonomy taxonomy… [ERROR]: ./._gg_13_5_99.pds.tax 000644 is missing the final ‘;’, ignoring.


It does not seem to be ignoring it. I have run this command multiple times using 3 processors (I have 4) for more then 24 hrs without getting any message that the pipeline finished or seen any output files.

Everything is fine using RDP and Silva.

How can I correct this problem? I want to use the greengenes as I am analysing caecal 16s chicken sequences obtained from an Ion Torrent run.

Thanks you veru much for your precious time.

Here’s a link to mothur’s GreenGenes files, http://www.wiki.mothur.org/wiki/Greengenes-formatted_databases. From the output you posted it looks like the files are not being read correctly. Could you try setting the debug flag to see what mothur is reading?

set.dir(debug=t)
classify.seqs(…)

Thanks you for your answer!

Alright, I will try the debug today!

I am already using the taxonomy file that you are suggesting.

On the page, there seems to be 3 files for download:

the taxonomy file for classify : greengenes reference taxonomy -

the alignement file for chimera: greengenes gold alignment

the referene alignment for alignment: greengenes reference alignment -

Where is the .fasta file for classify? Sorry,seems that i cannot find it on the page! It’s probably me but I really cannot find it!

I,ll be back at the end of the day with the debug results.

If you download and unzip the Gg_13_5_99.taxonomy folder from the “greengenes reference taxonomy” link, it contains 4 files: gg_13_5_99.fasta, gg_13_5_99.gg.tax, gg_13_5_99.pds.tax and pds.notes. You can run: classify.seqs(fasta=yourSequences, reference=gg_13_5_99.fasta, taxonomy=gg_13_5_99.gg.tax).

Hello!

I will run the debug tonight

I feell really stupid now but with the taxonomy file (in the curent stuff section) when I decompress the Gg_13_5_99.taxonomy.tar file I only get 1 file which is Gg_13_5_99.taxonomy and if I open it in Notepad I only see the taxonomy, nothing else. So weird!

I have a printscreen to prove it!

Sorry again, this is so puzzeling!

When I look at the link using propertiess I get the followin URL

http://www.wiki.mothur.org/w/images/9/9d/Gg_13_5_99.taxonomy.tgz

but the file I get is Gg_13_5_99.taxonomy.tar

Is there a difference between the .tgz and .tar file?

Okay, I think I see what’s going on. Your machine likely decompressed the tgz to a tar, but did not complete the decompression all the way to the GreenGenes Folder. Can you try double clicking on the .tar file to see if your machine will decompress the file for you? If not, you will have to open a command prompt and use tar to decompress the tar file into the folder I was describing above.

Morning!

I was able to get the files! The .taxonomy file that I received (decompressins of the .tar file I got from the wiki) was indeed an archive file not detected by Windows. By opening it further with Winrar, I was able to find the files!

They were well hidden.

I will try to run Mothur with these files now and see if I stil get the bug!

I will comeback to you as soon as it is finished!

It is classifying!

Many thanks for the support!

Have a nice day!

Hello,
I am new to Mothur and I am attempting to analyse my samples for the first time.
I have followed all the steps as I can see from my log file accordingly. I am now at the stage where I am trying to classify my sequences using the following;

classify.seqs(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.count_table, reference=trainset9_032012.pds.fasta, taxonomy=trainset9_032012.pds.tax, cutoff=80)

Unfortunately after the run, I get an ERROR message appear at the end of the output and no wang file is generated. The ERROR message is as follows;
[ERROR]: HWI-M02748_22_000000000-AAEY5_1_2112_8851_4245 is already in your taxonomy file, names must be unique

I deleted all the trainset files and downloaded the two I need from the site again and re-ran the classify.seqs command but I still get the same ERROR message as the output and no wang file to proceed onto the next stage.

Any help you could give would be greatly appreciated, or if you require more information please let me know.

Best wishes,
Kate

Can you try redownloading the training set from http://www.wiki.mothur.org/wiki/454_SOP? Also, make sure to remove the temporary files mothur makes. These should be called:

trainset9_032012.pds.8mer
trainset9_032012.pds.trainset9_032012.pds.8mer.numNonZero
trainset9_032012.pds.trainset9_032012.pds.8mer.prob
trainset9_032012.pds.tree.sum
trainset9_032012.pds.tree.train

Are you using the latest version of mothur, 1.34?