make.biom output inconsistent with make.shared output (?)

Hello,

I was wondering if you could help me please with making biom files for PICRUST. I followed the 454 SOP for quality-control and alignment of my sequences, and after that I ran classify.seqs (with gg_13_8_99.gg.tax), remove.lineages, dist.seqs, cluster, and:

make.shared (list=final.an.list, group=final.groups, label=0.03)
sub.sample(shared=final.an.shared)
classify.otu(list=final.an.list, name=final.names, taxonomy=final.taxonomy, label=0.03)
make.biom(shared=final.an.0.03.subsample.shared, label=0.03, reftaxonomy=gg_13_8_99.gg.tax, constaxonomy=final.an.0.03.cons.taxonomy, picrust=97_otu_map.txt)

However, when I compare my subsampled shared file with the shared biom file there are some strong differences:

(1) Size: while samples have 1732 sequences each in the shared file, samples in the biom shared file are sampled to very different depths, from around 3000 to over 130000

(2) Another strange difference is the distribution of OTUs. I give as an example a set of samples that are known to have one very abundant OTU and essentially zero or very low counts in other OTU. The subsampled shared file shows this pattern, but the biom shared file is completely different:

First few lines of final.an.0.03.subsample.shared (I edited it in excel to make visualization easier)

label 0.03 0.03 0.03 0.03 0.03 0.03 0.03
Group mg_1 mg_2 mg_3 mg_4 midgut_1 midgut_6 midgut_7
numOtus 219 219 219 219 219 219 219
Otu0001 1726 1680 1715 1729 1726 1724 1720
Otu0002 0 0 0 0 0 0 0
Otu0003 0 0 0 0 0 0 0
Otu0004 0 0 0 0 0 0 0
Otu0005 0 22 1 1 0 0 1
Otu0006 0 1 0 0 0 0 0
Otu0007 0 0 2 0 0 0 0
Otu0008 0 0 1 0 2 0 0
Otu0009 0 0 0 0 0 0 0
Otu0010 0 0 0 0 0 0 0

First few lines of final.an.0.03.subsample.0.03.biom_shared

label 0.03 0.03 0.03 0.03 0.03 0.03 0.03
Group mg_1 mg_2 mg_3 mg_4 midgut_1 midgut_6 midgut_7
numOtus 100 100 100 100 100 100 100
100001 18987 18502 18866 19020 18986 18964 18921
10001 1 0 0 0 0 0 0
1000113 0 0 0 0 0 0 0
1000148 13808 13442 13722 13832 13808 13792 13760
1000161 0 0 0 0 0 0 0
1000735 0 0 0 0 0 0 0
1000757 0 0 0 0 0 0 0
1000876 1726 1680 1715 1729 1726 1724 1720
100095 1726 1680 1715 1729 1726 1724 1720
100100 0 0 0 0 0 0 0

I have no idea what can be going wrong. I have mined this forum and other forums with no luck. Any help would be greatly appreciated! :slight_smile:

*I’m using Mothur 1.37.0

Are both files created using subsampled data? If not, that would cause a number of problems. If that doesn’t explain the problems, could you forward your shared file and what you think the biom file should look like to mothur.bugs@gmail.com and include a link to this post?

Pat

Thank you, Pat! I just sent my files to mothur.bugs.

Great question! Let me explain what mothur is doing. When the biom file is created for picrust mothur does several things:

  1. For each OTU the Green Genes OTUID is identified. This id is determined by the taxonomy assigned to the OTU and the mapping file provided.
  2. Picrust does not allow for duplicate OTUIDs in the biom file, so any OTUs that classify to the same taxonomy are merged. Within mothur we do not merge OTUs with the same classification because we form OTUs based on distance and preserve their distinction.
  3. To avoid confusion between the new biom file and the original shared file, mothur creates a biom_shared file that represents the newly merged picrust OTUs.