Hi all,
Both the <phylo.diversity> and unifrac commands require the user to read in a tree. Thus I generated a NJ tree in from a phylip formated distance matrix generated in mothur.
Immediately prior to generating the distance file I performed a final <unique.seqs> command with the names option. From this I have an unique.fasta file and an unique.names file, both with the same number of rows (i.e. clones). But at this point I also have a *.groups file generated BEFORE the last unique.seqs step. So, the groups file has more rows (i.e. clones) than both the names and fasta files.
This is not an issue in other mothur commands but is an issue <phylo.diversity> and unifrac commands. Here mothur automatically generates a new group file with the same number of clones that appear in the distance matrix, effectively eliminating identical clones from different groups.
FINALLY to my question. If the final unique.seqs step collapses identical clones from different groups into a single clone, won’t this affect the results of phylo.diversity and unifrac? It seems that I should generate a separate distance matrix, and hence a NJ tree, from a none de-convoluted dataset so that information is not lost. Am I making sense?