dist.seqs evolution model

nelsoncraige · July 3, 2011, 10:25pm

Hi,

I have searched extensively through the wiki, the original mothur pub in AEM, and Pat’s latest PLoS CB manuscript and I can’t seem to determine which model of nucleotide substitution is used in the dist.seqs command. All of the literature points out that the algorithm is identical to DNADIST from the PHYLIP package (with added flexibility in gap penalties) but DNADIST has variable options, one of which is the distance calculation model. The default model in PHYLIP is F84 (which the documentation for DNADIST notes incorporates some degree of maximum likelihood?) but my gut tells me that dist.seqs uses Jukes & Cantor nt substitution model for simplicity…some clarification would be much appreciated. Also, if there are other aspects of DNADIST which match or differ from the default implementation in PHYLIP when implemented in mothur it would be useful to note these as well in the documentation.

One final note: the wiki points the reader to the methods employed by Sogin et al 1995, but I could not find any papers with Sogin M as first author in 1995 via Google Scholar…perhaps this is a book chapter? Is there a list of references associated with the Wiki where I might find this reference?

Thanks as always

pschloss · July 5, 2011, 11:44am

In the PLoS CB paper I point out that most people use a model out of phylogenetic guilt. dist.seqs does not use a correction for multiple substitutions since none of them allow for gaps. phylip treats this as missing data, not an insertion/deletion. the sogin paper should be his first one using pyrosequencing and was published in PNAS. perhaps it’s 2006?

Pat

mikep1 · August 2, 2011, 6:12pm

Hi
I have a query about dist.seqs and amino acid sequences. Operating in ignorance, I got a distance file by processing an amino acid alignment in fasta format using the dist.seqs command. I then read posts in the forum e.g. “Using Mothur for protein sequences?!” and the wiki which implied protein distance matrices should be calculated outside mothur. Is this a real distance matrix , and if so, is this done using PROTDIST ?

Thanks

pschloss · August 3, 2011, 6:24pm

Yeah, dist.seqs should not be used for amino acid sequences. It will treat anything other than an A, T, G, C, or U as an N

gwidmer · December 1, 2011, 8:18pm

I have to agree with the last post that it is not obvious how distance is calculated. Maybe I overlooked it, but I looked around for a formula defining distance and couldn’t find it. Perhaps it would be feasible to include a link in the first sentence of the Dist.seqs page pointing to a definition.

thanks!

Giovanni

Topic		Replies	Views
Method used in the command dist.seqs Commands in mothur	4	1826	June 19, 2015
Distance computation algorithms Commands in mothur	1	28322	December 1, 2009
Dist.seqs Commands in mothur	1	2795	May 22, 2010
Older version of dist.seqs Commands in mothur	1	2273	December 22, 2011
pairwise.seqs mothur bugs	3	5113	March 4, 2011

dist.seqs evolution model

Related topics