Error in v1.14 read.tree or unifrac.weighted

Hi all,

Unfortunately it seems that mothur has acquired an error either within the read.tree or within unifrac.weighted sometime in the last few iterations: the process of generating a group distance matrix using unifrac is producing very different results for me using the current version (v.1.14.0) compared with July 2010 (v.1.11.0).

The routine I am using is as follows, with all relevant logfiles, input files, and output files attached and detailed below.
read.tree(tree,group,name)
unifrac.weighted(distance=true,groups=all,random=false)

I have a good grasp of the structure of the data in this dataset, specifically with respect to the relationships between the groups, and am quite sure that the unifrac distance results produced in v1.14 are incorrect. In addition to the two versions of mothur giving different unifrac.weighted distance matrices, the v1.14 unifrac matrix does not match distance matrices generated from bray-curtis analysis of group similarity following OTU clustering (all in v1.14), while the v1.11 unifrac.weighted matrix does match these other ways of comparing groups. I included a graphical depiction of the lack of correlation between v1.14 unifrac and v1.11 unifrac or bray-curtis distances generated by mothur or by Primer using mothur get.relabund output.

I should note that the read.tree command runs ~5X faster in v1.14.0 than it does in v1.11.0. In addition, there is good agreement among Unifrac matrices run on different subsets of this data in v1.14.0. For these reasons, I feel that the problem lies in some error with read.tree incorrectly parsing group and name files or failing to read the entire tree.

However, it does appear that there was some modification to unifrac between the two versions, as the 1.14 version ouputs a string of “Processing Combo: ####” during processing while v1.11 is silent.

Any help would be appreciated,
Thanks, Craig Nelson

Files linked for next 7 days at following address:
http://www.lifesci.ucsb.edu/filedist/CJsP7M66/UnifracErrorNelson.zip

  1. Input files for the read.tree are included (Project.tre,Project.group,Project.name)

  2. unifrac.weighted group comparison distance files (Project.outputv14.weighted.dist and Project.outputv11.weighted.dist generated by unifrac.weighted in v1.11 and v1.14 today using the same input files). Logfiles and wsummary files for these are included as well.

  3. A bray-curtis distance matrix (Project.an.braycurtis.0.03.lt.dist) generated from the same input fasta used to build the tree (built with FastTree). This matrix was generated using the following commands:
    dist.seqs(fasta=Project.fasta,cutoff=0.12)
    read.dist(column=Project.dist,name=Project.names)
    cluster(hard=t,method=average,cutoff=0.03)
    read.otu(list=Project.an.list,group=Project.groups,label=0.03)
    dist.shared(label=0.03,calc=braycurtis)

  4. A graphical relationship of mantel test correlations between the various distance matrices (Graph.png) showing that the v1.14 unifrac.weighted are outliers and uncorrelated with the other matrices. This diagram includes a distance matrix built in PRIMER-E v6 from the output of get.relabund(label=0.03) as a check on the dist.shared output matrix (they correspond well).

Thanks for your help in finding this bug, and for providing such a detailed report. In version 1.14.0 we changed how we calculated the unweighted and weighted value to correct for group compares where the groupings subtree’s root was not the root of the entire tree. We inadvertently added error to the weighted calculation. The fix will be part of 1.15, releasing in a couple weeks. Thanks again for your help.