More compact column distance format

laalaa99stl · February 3, 2010, 2:10pm

How bout this:

Sequence1
Sequence2 distance12
Sequence3 distance13
Sequence4 distance14

Sequence2
Sequence3 distance23
Sequence4 distance24

Sequence3
Sequence4 distance34

We can make it even smaller by replacing the sequence names with an index into the original fasta file, but that would require passing both the distance and fasta files into the subsequence read.dist command and would create the risk of forgetting what fasta file was used to generate the distances. You could mitigate the latter by including the fasta filename in the distance output – not a bad idea in its own right. See the previous forum member’s suggestion of outputting the actual command used to generate each of mothur’s output files in the output file itself.

The dist file could also be made into a binary format since I don’t know too many people who manually peruse distance files

Of course, if you’re on an NTFS system with compression turned on, it matters less, but it could help some people.

Robin

Topic		Replies	Views
dist.seq- taking lot of disk space Commands in mothur	1	1223	January 28, 2016
Rename sequences Feature requests	0	9587	March 26, 2015
dist.seqs bug? mothur bugs	18	15936	October 6, 2010
Large dist.seqs producing corrupt files? mothur bugs	11	10605	November 1, 2016
Dist.seqs output too big Commands in mothur	0	2671	August 5, 2010

More compact column distance format

Sequence1 Sequence2 distance12 Sequence3 distance13 Sequence4 distance14

Sequence2 Sequence3 distance23 Sequence4 distance24

Related topics

Sequence1
Sequence2 distance12
Sequence3 distance13
Sequence4 distance14

Sequence2
Sequence3 distance23
Sequence4 distance24