In the Miseq SOP it is suggested to use cluster.split as alternative to dist.seqs/cluster.
I prefer this way of working because even with very low cutoffs the distance matrices from dist.seqs take a lot of disk space.
However, to use unifrac you need a phylogenetic tree and to make a phylogenetic tree you do need to run the dist.seqs.
Is there no way at all to avoid this?
For instance, RaXML can take any alignment (I generally use it on mothur NAST alignments) and construct a phylogenetic tree without polluting the disk.
Or is this the reason the SOP mentions “this process gets mess as your number of sequences increases”?
Yeah, this gets messy with more sequences. The problem is you have to build a tree from all of those sequences unless you do some weird mapping procedure. If you can build a tree with RaXML, go for it.
Hi thanks for your reply, I was afraid there’s indeed no other option.
I am not very well aware of the licensing of RaXML but could RaxML be integrated in mothur?
In my humble opinion phylogeny (not taxonomy) is an interesting measure to assess some hypotheses on NGS data (implicitely OTU binning is some kind of “phylogenetic” binning too).
RaXML is fast and has several HPC extensions (MPI, …) additionally, for Illumina MiSeq read lengths, I think the EPA (evolutionary placement algorithm, http://sysbio.oxfordjournals.org/content/60/3/291.full) is something interesting?
Or am I missing something here?