phylip distance matrix vs column distance matrix

Hi all,

I am running my first pyrosequencing analysis following the Schloss SOP tutorial. The number of OTUs obtained varies a lot when you are calculating them from a column-formatted or a phylip-formatted distance matrix (337 to 607 in my case). Does it relate to unique sequences?
By the way, what text application do you use to open the list file generated from a column-formatted distance matrix when it is a heavy file?

Thanks to anyone who can help me with these doubts

There shouldn’t be a difference between the two methods if you’re comparing the same OTU cutoff. Are you comparing the same cutoff? Can you email the input alignment that you are giving to dist.seqs to and we can take a look?


Thanks for sending the files. The problem is that you are running…



cluster(column=final.dist, name=final.names)

By default cluster uses the average neighbor algorithm, which uses abundance information provided by the names file. For a true/correct comparison you should do…

cluster(phylip=final.phylip.dist, name=final.names)


CGabriel, I use Jujuedit for opening files up to 2 GB in a matter of seconds, has some nice features and you can use regular expressions to edit text!