Hi,
I am analyzing a large pyrosequencing dataset and would like to use libshuff to compare samples.
libshuff must be preceded by a read.dist command to read in the distances, which I understand.
However all my attempts to use the smaller, column-formatted distance files (which can benefit from a cutoff), e.g.
read.dist(column=mydata.unique.filter.dist,name=mydata.names,group=mydata.groups)
“You must read in a matrix and groupfile using the read.dist command, before you use the libshuff command.”
Which makes me, following the libshuff instructions, go back to dist.seqs and create a phylip-formatted distance matrix.
This seems to work, but, there is no provision for a cutoff when using the phylip output (presumably because every element in the matrix must be included) and thus the phylip distance matrix is very large (~1 Gigabyte) and the computation becomes extremely slow. I’m wondering whether there is a workaround or is what I described the way to do this business.
Thanks!