Classify.seqs Generating search database... takes forever

YanSun · May 5, 2020, 4:44pm

Hi,
I have used the reference file SILVA_132_LSURef_tax_silva_trunc.fasta (593 318 kb), and taxonomy file taxmap_slv_lsu_ref_132.txt (27 175 kb), to classify my sequences (processed successfully using mothur until this step), but the run stayed forever at

Using 8 processors.
Generating search database…

What could be the reason? Are the files too big? I tried to use get.lineage to pick the taxa of interest (Eukaryota-Bacteria; Cyanobateria), but the pick was only successful in the taxonomy file. I searched throughout the fasta file and there were these taxa. There was no use by deleting the “>” symbol in the fasta file either.

leocadio · May 7, 2020, 12:09am

what platform (OS) are you using? I had that problem once with redhat and I was just calling several times the same process instead of distributing it…

Deleting the “>”? Why?

YanSun · May 7, 2020, 6:12am

I tried 1.41.3 and 1.42.3, neither worked.

One earlier problem I came across was something like:

ACD298849847 found in the template and is not found in the taxonomy file

Some answer from the mothur forum suggested to remove “>”

After I removed “>”, there came the problem that is the one I posted here.

YanSun · May 7, 2020, 6:16am

Forgot to mention that I’m using windows 10.

pschloss · May 7, 2020, 5:13pm

Can you try to upgrade to the most recent version of mothur? Those versions are quite old at this point. A few questions…

How many lines are in SILVA_132_LSURef_tax_silva_trunc.fasta (in windows I think you can get this by running find /c /v "" SILVA_132_LSURef_tax_silva_trunc.fasta from the command line)
How many lines are in taxmap_slv_lsu_ref_132.txt
Can you post the first few lines of both files?

Pat

YanSun · May 8, 2020, 6:30am

Hi Pat,
There are 7484196 lines in the fasta file, and 198844 lines in the taxonomy file.
The first lines in both files:
AY224383.3948.6873 Bacteria;Firmicutes;Bacilli;Bacillales;Bacillaceae;Bacillus;Bacillus cereus
GGUUAAGUUAGAAAGGGCGCACGGUGGAUGCCUUGACACUAGGAGUCGAUGAAGGACGGGACUAACGCCGAUAUGCUUCG
GGGAGCUGUAAGUAAGCUUUGAUCCGAAGAUUUCCGAAUGGGGAAACCCACCAUACGUAAUGGUAUGGUAUCCUUAUCUG
GAUUUCCGAAUGGGGAAACCCACCAUACGUAAUGGUAUGGUAUCCUUAUCUG

primaryAccession.start.stop path organism_name taxid
AY224379.2894.5819 Bacteria;Firmicutes;Bacilli;Bacillales;Bacillaceae;Bacillus; Bacillus cereus 815
AY224380.2894.5819 Bacteria;Firmicutes;Bacilli;Bacillales;Bacillaceae;Bacillus; Bacillus cereus 815
AY224381.2668.5593 Bacteria;Firmicutes;Bacilli;Bacillales;Bacillaceae;Bacillus; Bacillus cereus 815

I have deleted the “>” in the fasta file. In the taxonomy file, the first column was originally three separate columns, with the primaryAccession column corresponds to e.g. AY224379, and start column corresponds to e.g. 2894, and stop column corresponds to e.g. 5819. I combined these columns. Because, before I did these, there was an error like AY224379.2894.5819 was in the template file but not in the taxonomy file.

pschloss · May 8, 2020, 11:38am

A couple things stand out as problem…

You must have the > in a fasta file. That’s part of what makes it a fasta file.
You need to get rid of the “primaryAccession.start.stop path organism_name taxid” line in your taxonomy file
The second column of your taxonomy file cannot have spaces in it (e.g. “; Bacillus cereus 815”
The last character of the taxonomy file needs to be a “;”
What version of mothur are you using?

I would encourage you to follow the README for constructing the SILVA reference files to see how we generated the one we provide users so that you can adapt it for your data.

Pat

YanSun · May 8, 2020, 7:53pm

Hi Pate,
I did what you suggested. Now classify.seqs works. But get.lineage(fasta=SILVA_132_LSURef_tax_silva_trunc.fasta, taxonomy=taxmap_slv_lsu_ref_132.tx, taxon=Eukaryota-Bacteria;Cyanobacteria) still worked only with the taxonomy file, but not with the fasta file.

pschloss · May 8, 2020, 8:33pm

what version of mothur are you using?

YanSun · May 9, 2020, 6:11am

I tried 1.41.3, 1.42.3 and 1.43

pschloss · May 9, 2020, 10:19am

Can you try the latest version that was posted?

YanSun · May 12, 2020, 3:56pm

Hi Pat,
I tried the 1.44.1 version. Now everything works. Thanks for the help!

Yan

system · May 22, 2020, 3:56pm

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
classify.seqs with own taxonomy reference files mothur bugs	10	16937	October 18, 2013
taxonomy mothur bugs	13	16658	July 21, 2010
classify seqs V1.19 mothur bugs	8	9348	July 11, 2011
classify.seqs * glibc detected * mothur: free(): invalid mothur bugs	1	6279	February 17, 2010
Error in bacterial SILVA taxonomy files mothur bugs	2	4845	November 14, 2012

Classify.seqs Generating search database... takes forever

Related topics