More compact column distance format

How bout this:

Sequence1
Sequence2 distance12
Sequence3 distance13
Sequence4 distance14

Sequence2
Sequence3 distance23
Sequence4 distance24

Sequence3
Sequence4 distance34

We can make it even smaller by replacing the sequence names with an index into the original fasta file, but that would require passing both the distance and fasta files into the subsequence read.dist command and would create the risk of forgetting what fasta file was used to generate the distances. You could mitigate the latter by including the fasta filename in the distance output – not a bad idea in its own right. See the previous forum member’s suggestion of outputting the actual command used to generate each of mothur’s output files in the output file itself.

The dist file could also be made into a binary format since I don’t know too many people who manually peruse distance files :slight_smile:

Of course, if you’re on an NTFS system with compression turned on, it matters less, but it could help some people.

Robin