read.tree - mothur assumptions about Newick format?

Hi there,

I’m having difficulty loading a phylogeny in Newick format using the read.tree command. On Mac OS X 10.5.8, mothur version 1.6.0 (compiled with 64-bit flags), on a server with 24GB RAM, I get a segmentation fault when I attempt to load a phylogeny with approximately 15,000 tips. This tree was generated by FastTree, it has internal node labels, has polytomies, has branch lengths, is unrooted. The phylogeny loads fine in software including Dendroscope and R’s ape package.

I couldn’t find documentation for the read.tree command that specified whether any of the above features of the tree would be expected to cause a crash of mothur. Does mothur support unrooted trees with internal node labels? If there is some mismatch between the tree and the groups file, would that be expected to cause a segmentation fault?

Thanks for any insight you can provide,

So mothur will gag if you have internal node labels and internal polytomies. It’s fine if the tree is not rooted. So, mothur assumes that you have a bifurcating tree that may or may not have branch lengths and may or may not be rooted. If this is something people are interested in having, cool, we’ll work on it. I know that some people have been moving from clearcut to fasttree and so this may be a bit of an issue for some.

I, for one, am having a hard time accepting these trees built with these heuristics and that are so large and based on relatively short sequences. But if the people speak…

Removing internal node labels and resolving polytomies to bifurcating nodes allowed the phylogeny file to be loaded with read.tree() so that was indeed the issue.

Quantifying the uncertainty inherent in all phylogenetic reconstruction methods instead of ignoring it seems highly desirable, and polytomies and internal node labels indicating nodal support values are one way to allow the representation of this uncertainty. Polytomies can also simply indicate the presence of identical sequences, which is the case for the phylogeny I am analyzing.

Phylogenies containing polytomies and internal node labels are sometimes found in consensus and bootstrap summary phylogenies generated by FastTree, MrBayes, RAxML, PAUP*, PHYLIP, etc., so it would be useful (to me at least) if mothur could deal with these features of the Newick standard. A note in the manual that they are not supported would also suffice since it’s easy enough in most cases to remove node labels and work directly with multiple binary trees instead of a consensus tree.

Thanks for the speedy reply and for creating mothur, it’s extremely useful.

I agree with your point of assessing the uncertainty of a tree; however, what do you want to do with a consensus tree in mothur? There aren’t any branch lengths [right?], so the unifrac tools would be out and if you have polytomies, parsimony analysis would be out as well. I’m just trying to get a handle on things to prioritize where to put this in “the list”.

re: branch lengths on a consensus tree, true enough! I think the presence of node labels to indicate support values and polytomies to indicate identical sequences are common in phylogenies produced by several software packages, but these are easy to remove prior to analysis in mothur. So if adding support for labels and polytomies would require substantial work I would say it doesn’t need to be a priority.