You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Galaxy supports fastq.gz files. For anyone interested in very fast gzip compression I recommend checking out ISA-L. Which comes with an igzip application that decompresses/compresses much faster than standard gzip.
Much faster in this case means 3x faster decompression and 6x faster compression. It is available on conda-forge and can be installed with conda install -c conda-forge isa-l.
The good news is that there are also python-bindings available. These are made by me, and an extensive test set is used to ensure that it works properly. The python bindings are now used by xopen and by extension cutadapt.
Using python-isal will make decompression a lot faster. For compression there is a slight tradeoff in that the filesize will be slightly bigger as ISA-L does not support a very high compression level (but still better than gzip level 1).
EDIT: I am willing to implement this myself if there is interest. Also I forget to mention that python-isal has no dependencies (the C-library is statically linked), so there is no dependency hell.
The text was updated successfully, but these errors were encountered:
I see isal is now a hard dependency due to your work on #17342
I see the current gz to uncompressed converter uses gzip -dcf. However since python-isal is required, python-m isal.igzip should also be available.
To illustrate the difference I decompress a 1.6GB fastq file here:
Benchmark 1: python -m isal.igzip -cd ~/test/5millionreads_R1.fastq.gz > /dev/null
Time (mean ± σ): 2.008 s ± 0.011 s [User: 1.956 s, System: 0.051 s]
Range (min … max): 1.997 s … 2.028 s 10 runs
Benchmark 1: gzip -cd ~/test/5millionreads_R1.fastq.gz > /dev/null
Time (mean ± σ): 8.162 s ± 0.080 s [User: 8.103 s, System: 0.058 s]
Range (min … max): 8.093 s … 8.375 s 10 runs
4 times faster! By the way, this is mostly due to gzip's code, not to zlib. If I use the pigz implementation on one thread the decompression is also faster than gzip:
Benchmark 1: pigz -p 1 -cd ~/test/5millionreads_R1.fastq.gz > /dev/null
Time (mean ± σ): 4.123 s ± 0.025 s [User: 4.076 s, System: 0.047 s]
Range (min … max): 4.089 s … 4.173 s 10 runs
Galaxy supports fastq.gz files. For anyone interested in very fast gzip compression I recommend checking out ISA-L. Which comes with an
igzip
application that decompresses/compresses much faster than standard gzip.Much faster in this case means 3x faster decompression and 6x faster compression. It is available on conda-forge and can be installed with
conda install -c conda-forge isa-l
.The good news is that there are also python-bindings available. These are made by me, and an extensive test set is used to ensure that it works properly. The python bindings are now used by xopen and by extension cutadapt.
Using python-isal will make decompression a lot faster. For compression there is a slight tradeoff in that the filesize will be slightly bigger as ISA-L does not support a very high compression level (but still better than gzip level 1).
EDIT: I am willing to implement this myself if there is interest. Also I forget to mention that python-isal has no dependencies (the C-library is statically linked), so there is no dependency hell.
The text was updated successfully, but these errors were encountered: