I am analyzing a large pyrosequencing dataset and would like to use libshuff to compare samples.
libshuff must be preceded by a read.dist command to read in the distances, which I understand.
However all my attempts to use the smaller, column-formatted distance files (which can benefit from a cutoff), e.g.
“You must read in a matrix and groupfile using the read.dist command, before you use the libshuff command.”
Which makes me, following the libshuff instructions, go back to dist.seqs and create a phylip-formatted distance matrix.
This seems to work, but, there is no provision for a cutoff when using the phylip output (presumably because every element in the matrix must be included) and thus the phylip distance matrix is very large (~1 Gigabyte) and the computation becomes extremely slow. I’m wondering whether there is a workaround or is what I described the way to do this business.