Jaccard distance to Jaccard tree

Hi,

I found a large discrepancy between the output for jclass using the summary.shared or dist.shared and the trees from tree.shared. I found that the distance matrix generated with dist.shared gives exactly 1-jclass for the respective comparison in the summary.shared output, as expected. My problem is that the tree built with tree.shared based on jclass does neither correspond to the distance matrix nor the summary.shared output. My expectation was that samples with smallest distance (highest similarity) from the distance matrix should cluster together in the tree. Indeed the two samples with the smallest Jaccard distance end up on different parts of the Jaccard tree. Am I not getting what UPGMA does or am I not getting a more general part of the theory or is it a bug?

Thank you for comments,
Fabian

This is the tree:
((((69.6M:0.316176,64.6M:0.316176):0.00804999,(68.6M:0.287879,82.6M:0.287879):0.0363477):0.0209049,80.6M:0.345131):0.0352789,((((63.6M:0.298173,(59.6M:0.280303,56.6M:0.280303):0.0178697):0.0131899,74.6M:0.311363):0.0167578,81.6M:0.328121):0.0424201,(67.6M:0.357143,65.6M:0.357143):0.0133978):0.00986964):0.11959;

The comparison 65.6M vs. 63.6M should be very close in the tree because 0.560606 is the smallest distance in the Matrix but it is not

This is the distance matrix:
12
56.6M
59.6M 0.575758
63.6M 0.675676 0.699115
64.6M 0.756757 0.704225 0.771186
65.6M 0.695238 0.695238 0.560606 0.750000
67.6M 0.705128 0.705128 0.787402 0.714286 0.736842
68.6M 0.689655 0.689655 0.771429 0.878788 0.797980 0.814286
69.6M 0.666667 0.651685 0.648438 0.703297 0.630252 0.704082 0.785714
74.6M 0.732143 0.796610 0.566176 0.833333 0.678571 0.835938 0.796117 0.687500
80.6M 0.631579 0.683544 0.669492 0.756098 0.663636 0.738636 0.656250 0.659794 0.722689
81.6M 0.663265 0.663265 0.573643 0.656250 0.619048 0.710280 0.771739 0.610619 0.623077 0.657143
82.6M 0.639344 0.639344 0.752294 0.693548 0.714286 0.636364 0.725490 0.702381 0.809091 0.632353 0.736842

Fabian,
I’m afraid I can’t reproduce what you’re seeing…

From mothur…
((((82.6M:0.316177,80.6M:0.316177):0.00804988,(59.6M:0.287879,56.6M:0.287879)
:0.0363474):0.0209049,68.6M:0.345131):0.035279,((((81.6M:0.298173,(65.6M
:0.280303,63.6M:0.280303):0.0178698):0.0131899,74.6M:0.311363):0.0167579,
69.6M:0.32812):0.0424202,(67.6M:0.357143,64.6M:0.357143):0.0133977):0.00986955)
:0.11959;

From neighbor using upgma (in phylip)…
(((((56.6M:0.28788,59.6M:0.28788):0.03635,(80.6M:0.31618,82.6M:0.31618):0.00805)
:0.02090,68.6M:0.34513):0.01386,((((63.6M:0.28030,65.6M:0.28030):0.01787,81.6M
:0.29817):0.01313,74.6M:0.31130):0.01080,69.6M:0.32210):0.03689):0.01295,(64.6M
:0.35714,67.6M:0.35714):0.01480);


Yours... ((((69.6M:0.316176,64.6M:0.316176):0.00804999,(68.6M:0.287879,82.6M:0.287879) :0.0363477):0.0209049,80.6M:0.345131):0.0352789,((((63.6M:0.298173,(59.6M :0.280303,56.6M:0.280303):0.0178697):0.0131899,74.6M:0.311363):0.0167578, 81.6M:0.328121):0.0424201,(67.6M:0.357143,65.6M:0.357143):0.0133978):0.00986964) :0.11959;
You should be able to see in mothur and neighbor that 65.6M and 63.6M do cluster together and their branch lengths seem appropriate. Basically what we're seeing is that your tree and the tree I generated in mothur have the exact topology, but different labels. Can you please provide us with the exact commands you entered? Also, which version of mothur are you using? I don't think we've touched this in awhile...

Pat

Fabian, et al.

It looks like we have a small but potentially disastrous bug if you use the ordergroup option in read.otu or if you manually alter the ordering of the rows in the shared file - hopefully this is the bug you have found. We hope to put out a release by the middle of next week that will include this as well as other changes to the program. If anyone would like a copy of this early, please shoot an email to mothur.bugs@gmail.com.

Thanks for catching this…
Pat

Hi Pat,

thank you for the quick reply. Can you localize the potential bug already? Is it in the tree or in the distance matrix? Or might both be wrong? Is there a command to build the tree from the distance matrix in the .dist file instead of using tree.shared?


Here is the log, just repeated the procedure a second ago with identical results/discrepancy as before:

Windows version

Running 32Bit Version

mothur v.1.12.3
Last updated: 8/5/2010

by
Patrick D. Schloss

Department of Microbiology & Immunology
University of Michigan
pschloss@umich.edu
http://www.mothur.org

When using, please cite:
Schloss, P.D., et al., Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol, 2009. 75(23):7537-41.

Distributed under the GNU General Public License

Type ‘help()’ for information on the commands that are available

Type ‘quit()’ to exit program
Interactive Mode


mothur > read.otu(list=6M.phylip.fn.list, group=6M.groups, label=0.10) 0.10

Output File Names:
6M.phylip.fn.82.6M.rabund
6M.phylip.fn.68.6M.rabund
6M.phylip.fn.56.6M.rabund
6M.phylip.fn.65.6M.rabund
6M.phylip.fn.59.6M.rabund
6M.phylip.fn.67.6M.rabund
6M.phylip.fn.80.6M.rabund
6M.phylip.fn.81.6M.rabund
6M.phylip.fn.74.6M.rabund
6M.phylip.fn.64.6M.rabund
6M.phylip.fn.63.6M.rabund
6M.phylip.fn.69.6M.rabund
6M.phylip.fn.shared


mothur > tree.shared() 0.10

Output File Names:
6M.phylip.fn.jclass.0.10.tre
6M.phylip.fn.thetayc.0.10.tre


mothur > dist.shared() 0.10

Output File Names:
6M.phylip.fn.jclass.0.10.lt.dist
6M.phylip.fn.thetayc.0.10.lt.dist

can you send the list and group file to mothur.bugs@gmail.com?

Dear Pat,

have you received the files you requested? Sent them on 8/28. Maybe they are stuck in your spam filter?

Fabian

Hi Fabian,

I was able to track down the bug. In version 1.12.3 we started sorting the groups alphabetically in the shared file. This introduced a bug in tree.shared, when you build a tree using the shared file. It will be fixed in our next release. To work around this you can build a tree from your .dist file using tree.shared, or by using the clearcut command.

tree.shared(phylip=6M.phylip.fn.jclass.0.10.lt.dist)
tree.shared(6M.phylip.fn.thetayc.0.10.lt.dist)

or

clearcut(phylip=6M.phylip.fn.jclass.0.10.lt.dist)
clearcut(6M.phylip.fn.thetayc.0.10.lt.dist)

  • Sarah

Thank you for sorting this out!