dist file difficulties

sammy · June 30, 2010, 1:12pm

Hi all
I know this topic has been covered extensively in this forum, never the less Iâ€™m still having difficulties.
Iâ€™m trying to do a downstream analysis of 20 pyrosequencing libraries and to use one of the hypotheses driven OTUâ€™s analysis; unifrac or libshuff. The problem is the column distance file doesnâ€™t work with the libshuff (the name file actually) command or neighbor.exe (from the pylip package) and the dist file (square) is to big > 5g. the only way I could think of is restart the data analysis without creating a name file (no unique.seqs or precluster) for the use of unifrac, but still I canâ€™t create a tree using neighbor.exe from the pylip package.
Is there a much easier way to use read.dist for libshuff (including the name file)? and is it possible to create a tree using a column distance file?

pschloss · July 1, 2010, 12:50pm

So here’s two answers…

The hypothesis testing approaches are generally worthless, so don’t bother. You will probably have so much statistical power from 454 that the libraries will be statistically different. Remember all you get from these tests is a p-value which is akin to a yes or no answer. Not incredibly informative.
Assuming you don’t appreciate the cynicism or your advisor is breathing down your neck you have a few options. First, do not use neighbor.exe - it is extremely slow an memory inefficient. Something like fasttree or clearcut (we have a wrapper for cc in mothur) would be better. As for the size of the distance matrix, I’m afraid you may be stuck. Make sure that you are simplifying the dataset as much as possible with judicious use of unique.seqs, filter.seqs, pre.cluster, and chimera.slayer. The column format will not gain you anything because you need all of the distances and if you don’t use a cutoff, then the column matrix ends up being larger than a phylip-formatted matrix. One option, would be to do an OTU-based approach, identify representative sequences and then build trees from those to run through unifrac. I have some misgivings about that approach, but Knight et al. seem to favor it.

Hope this helps.
Pat

sammy · July 1, 2010, 1:18pm

thaks a lot

Topic		Replies	Views
creating a tree for UniFrac Commands in mothur	5	5290	December 12, 2012
distances for libshuff? Commands in mothur	1	2789	June 22, 2010
Phylip vs. Column-based format changing downstream results Commands in mothur	2	5261	June 17, 2010
Libshuff Commands in mothur	1	2783	August 12, 2010
Phylogenetic Tree File	1	308	July 25, 2023

dist file difficulties

Related topics