I’m working with some sequence from several different samples that I want to compare using unifrac. I had done the analysis previously building my tree file using the tree.shared() command, but I recently went back and tried building the tree file using clearcut() and found that in the subsequent unifrac it gave a different result. It doesn’t bother me what the true result is (whether the communities are significantly different or not) but it worries me that I can get different results from the same data depending on which way I build the tree. Reading through the mothur wiki I’ve seen both methods used for performing unifrac so I don’t see a clear preference for performing the analysis. Can anyone enlighten me as to which results to trust?
Also, on a general note, is there a good rule of thumb for ideal sequence length when build the distance matrix for these sort of analysis? I ask because there seems to be a bit of a judgement call when screening the aligned sequences. I suppose it’s the choice between more sequences in the data set or fewer, longer, sequence to analyze.