mothur

Problems using make.biom for PICRUSt

#1

Hello there.

Using MOTHUR v.1.40 I am trying to construct a .biom file to use it on PICRUSt with the command ‘make.biom’. This is what I do (following biom format for PICRUST):

classify.seqs(fasta=my_data.fasta, template=gg_13_5.fasta, taxonomy=gg_13_5_taxonomy_fin.tax)
make.biom(shared=my_data.phylip.opti_mcc.0.03.subsample.shared, label=0.03, reftaxonomy=gg_13_5_taxonomy_fin.tax, constaxonomy=my_data.gg_13_5_taxonomy_fin.wang.taxonomy, picrust=97_otu_map.map)

And I get the following error on my logfile (the terminal automatically shutdown):

[ERROR]: std::bad_allocRAM used: 1.95832Gigabytes . Total Ram: 7.90855Gigabytes.

 has occurred in the MakeBiomCommand class function getMetadata. This error indicates your computer is running out of memory.  This is most commonly caused by trying to process a dataset too large, using multiple processors, or a file format issue. If you are running our 32bit version, your memory usage is limited to 4G.  If you have more than 4G of RAM and are running a 64bit OS, using our 64bit version may resolve your issue.  If you are using multiple processors, try running the command with processors=1, the more processors you use the more memory is required. Also, you may be able to reduce the size of your dataset by using the commands outlined in the Schloss SOP, http://www.mothur.org/wiki/Schloss_SOP. If you are unable to resolve the issue, please contact Pat Schloss at mothur.bugs@gmail.com, and be sure to include the mothur.logFile with your inquiry.

The dataset is not so big (I’ve also tried with a subsampled file of only 100 seqs), the version is the 64 bit one, so maybe is a file format error? Or a problem between versions of GreenGrenes data base? I also tried with the 99_otu_map version and I still have the same problem

I was wondering how to get a .cons.taxonomy from a ‘classfy.seqs’ command, since maybe this is the main issue?? I tried to run ‘classify.otu’ after ‘classify.seqs’ and I get a cons.taxonomy file that can subsequently be used for ‘make.file’, but instead of .biom output file I get a ‘.sum’ output file that does not reflect abundance of each OTU per sample (taken from the ‘.shared’ file?), as it is suposed to show when following PICRUSt tutorial.

How can I solve this problem?

Thanks a lot
Miguel Ángel

#2

Would you mind giving the latest version a shot? There have been some modifications since 1.40 was released that might help

#3

Thanks Pat.
I am trying exactly the same but with v.1.41.0 mothur executable and it seems to be working… although is taking so long… And I am running the commands in 64gb RAM server!

I am going to wait a bit more (let’s say, couple of hours) and if I have any result (good or bad) I will write again.

However, I still wonder if the problem is the .taxonomy file used as ‘constaxonomy’ input, what do you think?

Cheers

#4

Finally, after 4 hours of analysis, I got the same error, but with nearly 64gb of RAM used.

Any idea of what can I do?

Thanks a lot

#5

“However, I still wonder if the problem is the .taxonomy file used as ‘constaxonomy’ input, what do you think?”

Using a taxonomy file instead of a constaxonomy file will cause issues. The files have different formats, and mothur will have trouble reading it, which could cause unexpected behavior and the bad_alloc. You can generate a *.constaxonomy file using the classify.otu command, https://www.mothur.org/wiki/Classify.otu.

#6

Dear westcott:

Thanks for your answer. However, I had tried what you say before:

And I get that problem: I cannot have the ‘.biom’ output with the inputs coming from my previous ‘classify.seqs’.

Am I doing anything wrong in that step, maybe?

#7

Could you post the exact commands you are running?

#8

Here is what I have from the logfile when running the commands without classify.otu:

classify.seqs(fasta=prueba_gg.fasta, template=gg_13_5.fasta, taxonomy=gg_13_5_taxonomy_fin.tax)

Using 4 processors.
Reading template taxonomy...     DONE.
Reading template probabilities...     DONE.
It took 17 seconds get probabilities. 
Classifying sequences from prueba_gg.fasta ...
It took 1 secs to classify 100 sequences.


It took 1 secs to classify 100 sequences.


It took 0 secs to create the summary file for 100 sequences.


Output File Names: 
prueba_gg.gg_13_5_taxonomy_fin.wang.taxonomy
prueba_gg.gg_13_5_taxonomy_fin.wang.tax.summary


mothur > 
make.biom(shared=bact_ntk.trim.contigs.good.unique.good.filter.unique.precluster.pick.phylip.opti_mcc.0.03.subsample.shared, label=0.03, reftaxonomy=gg_13_5_taxonomy_fin.tax, constaxonomy=bact_ntk_GG.trim.contigs.good.unique.good.filter.unique.precluster.pick.gg_13_5_taxonomy_fin.wang.taxonomy, picrust=97_otu_map.map)
0.03
[ERROR]: std::bad_allocRAM used: 2.48998Gigabytes . Total Ram: 7.90855Gigabytes.

 has occurred in the MakeBiomCommand class function getMetadata. This error indicates your computer is running out of memory.  This is most commonly caused by trying to process a dataset too large, using multiple processors, or a file format issue. If you are running our 32bit version, your memory usage is limited to 4G.  If you have more than 4G of RAM and are running a 64bit OS, using our 64bit version may resolve your issue.  If you are using multiple processors, try running the command with processors=1, the more processors you use the more memory is required. Also, you may be able to reduce the size of your dataset by using the commands outlined in the Schloss SOP, http://www.mothur.org/wiki/Schloss_SOP. If you are unable to resolve the issue, please contact Pat Schloss at mothur.bugs@gmail.com, and be sure to include the mothur.logFile with your inquiry.

And this is what I get when using the comand:

mothur > 
classify.seqs(fasta=bact_ntk_GG.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta, template=gg_13_5.fasta, taxonomy=gg_13_5_taxonomy_fin.tax)

Using 4 processors.
Reading template taxonomy...     DONE.
Reading template probabilities...     DONE.
It took 18 seconds get probabilities. 
Classifying sequences from bact_ntk_GG.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta ...
[WARNING]: M02255_282_000000000-BYPTP_1_1101_4472_19743 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
....
[WARNING]: mothur reversed some your sequences for a better classification.  If you would like to take a closer look, please check bact_ntk_GG.trim.contigs.good.unique.good.filter.unique.precluster.pick.gg_13_5_taxonomy_fin.wang.flip.accnos for the list of the sequences.

It took 972 secs to classify 92786 sequences.


It took 5 secs to create the summary file for 92786 sequences.


Output File Names: 
bact_ntk_GG.trim.contigs.good.unique.good.filter.unique.precluster.pick.gg_13_5_taxonomy_fin.wang.taxonomy
bact_ntk_GG.trim.contigs.good.unique.good.filter.unique.precluster.pick.gg_13_5_taxonomy_fin.wang.tax.summary
bact_ntk_GG.trim.contigs.good.unique.good.filter.unique.precluster.pick.gg_13_5_taxonomy_fin.wang.flip.accnos


mothur > 
classify.otu(list=bact_ntk.trim.contigs.good.unique.good.filter.unique.precluster.pick.phylip.opti_mcc.list, count=bact_ntk.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.pick.count_table, taxonomy=current, label=0.03)
Using bact_ntk_GG.trim.contigs.good.unique.good.filter.unique.precluster.pick.gg_13_5_taxonomy_fin.wang.taxonomy as input file for the taxonomy parameter.
0.03	23646

Output File Names: 
bact_ntk.trim.contigs.good.unique.good.filter.unique.precluster.pick.phylip.opti_mcc.0.03.cons.taxonomy
bact_ntk.trim.contigs.good.unique.good.filter.unique.precluster.pick.phylip.opti_mcc.0.03.cons.tax.summary


mothur > 
make.biom(shared=current, label=0.03, reftaxonomy=gg_13_5_taxonomy_fin.tax, constaxonomy=current, picrust=99_otu_map.txt)
Using bact_ntk.trim.contigs.good.unique.good.filter.unique.precluster.pick.phylip.opti_mcc.0.03.cons.taxonomy as input file for the constaxonomy parameter.
Using bact_ntk.trim.contigs.good.unique.good.filter.unique.precluster.pick.phylip.opti_mcc.0.03.subsample.shared as input file for the shared parameter.
0.03
[ERROR]: k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales;f__Paenibacillaceae;g__Brevibacillus; is not in taxonomy tree, please correct.

And I do not get any .biom file, just a ‘.sum’ file.

closed #9

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.

#10

Could you send your input files to mothur.bugs@gmail.com?