Mothur should be able to read/write gzipped files

nickholway · January 18, 2012, 8:16am

To save disk space Mothur should be able to read and write gzipped files to save on disk space and lower the amount of data that needs to be read from disk.

flo · July 26, 2013, 1:06pm

With the MiSeq data, the performance of mothur would be a lot improved by the ability to read and write compressed files.
For instance, I have a 53 Gb file of aligned sequence against the silva template, which only takes 800 Mb when gzipped. Mothur spend a lot of the time reading and writing such big files.
A nice side effect would be disk space saving, of course

pschloss · July 26, 2013, 5:47pm

Flo - are you following the sop including using a region specific silva.bacteria.fasta file?

Also, I’m sure that compressing 53 GB to 800 MB took a very long time, right? Anything we would do would encumber that time penalty as well.

We’re looking into it…
Pat

adamc83 · October 2, 2013, 9:49pm

I think this would be a good addition as well. I do most of my analyses on EC2, where its easy to get ample compute resources, but adding high IO to that can be additional hassle as well as more money. The instance I use 99% of the time has 16 processors/32 threads, but struggles to read or write to disk at 50MB/s. Alignment speed is actually limited by IO in this case.

If you’re worried about speed, you could use something like LZ4 (compresses about 400MB/s per core), with the tradeoff being that it is more obscure than gzip/zlib.

Topic		Replies	Views
Zsh crash on Linux mothur bugs	4	737	May 5, 2019
how to get mothur to read files from a portable hard drive Commands in mothur	1	612	June 20, 2018
Problems with dist.seqs and illumina reads mothur bugs	1	2522	January 6, 2014
Mothur should be able to read & write from stdin & stdout Feature requests	0	3284	January 18, 2012
Killed: 9 while running dist.seqs mothur bugs	4	1810	February 19, 2019

Mothur should be able to read/write gzipped files

Related topics