Skip to content

Commit

Permalink
Make --compression-level=1 the default
Browse files Browse the repository at this point in the history
Closes #808
  • Loading branch information
marcelm committed Nov 12, 2024
1 parent 6530114 commit 8d67c7c
Show file tree
Hide file tree
Showing 4 changed files with 22 additions and 13 deletions.
11 changes: 11 additions & 0 deletions CHANGES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,17 @@ Changelog
development version
-------------------

* :issue:`808`: Made gzip compression level 1 the default, which improves
runtime significantly in many cases. (Compressing the output is often a
bottleneck when using multiple threads.) Output files will be larger, but because
Cutadapt is typically used in intermediate step where the output would be
deleted, this may often not have an impact on final disk usage.
Suggested by @rhpvorderman (many times). Use ``--compression-level=N`` with
``N`` greater than ``N`` to get higher compression at the cost of speed.
(``N=5`` is the old default.)
Option ``-Z`` (equivalent to ``--compression-level=1``) is now deprecated.
* The previously hidden option ``--compression-level`` is now shown in the
``--help`` output.
* :issue:`820`: On Bioconda, Cutadapt is now also available for ARM64 Macs (M1/M2).
* Dropped support for Python 3.8.
* Added support for Python 3.13.
Expand Down
15 changes: 6 additions & 9 deletions doc/recipes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -191,19 +191,12 @@ Speeding up Cutadapt

There are several tricks for limiting wall-clock time while using Cutadapt.

Option ``-Z`` (equivalent to ``--compression-level=1``) can be used to limit the
amount of CPU time which is spent on the compression of output files.
Alternatively, choosing filenames not ending with ``.gz``, ``.bz2`` or ``.xz``
will make sure no CPU time is spent on compression at all. On systems
with slow I/O, it can actually be faster to set a higher compression-level
than 1.

Increasing the number of cores with ``-j`` will increase the number of reads per
minute at near-linear rate.

It is also possible to use pipes in order to bypass the filesystem and pipe
It is possible to use pipes in order to bypass the filesystem and pipe
Cutadapt's output into an aligner such as BWA. The ``mkfifo`` command allows
you to create named pipes in bash.
you to create named pipes.

.. code-block::bash
Expand All @@ -214,6 +207,10 @@ you to create named pipes in bash.
This command will run cutadapt and BWA simultaneously, using Cutadapt’s output as
BWA’s input, and capturing Cutadapt’s report in ``cutadapt.log``.

.. versionadded:: 4.10
Option ``-Z`` (equivalent to ``--compression-level=1``), which was earlier
recommended for speeding up processing, is now the default.


Check whether a FASTQ file is properly formatted
------------------------------------------------
Expand Down
2 changes: 2 additions & 0 deletions doc/reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -348,6 +348,8 @@ Output
Use this option to force FASTA even in such a case.

``-Z``
**Deprecated**: This option has become the default in Cutadapt 4.10.

Use compression level 1 for gzipped output files.
This is a shorthand for ``--compression-level=1``.

Expand Down
7 changes: 3 additions & 4 deletions src/cutadapt/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -186,9 +186,6 @@ def get_argument_parser() -> ArgumentParser:
# Buffer size for the reader process when running in parallel
group.add_argument("--buffer-size", type=int, default=4000000,
help=SUPPRESS)
# Compression level for gzipped output files. Not exposed since we have -Z
group.add_argument("--compression-level", type=int, default=5,
help=SUPPRESS)
# Disable adapter index creation
group.add_argument("--no-index", dest="index", default=True, action="store_false", help=SUPPRESS)

Expand Down Expand Up @@ -344,8 +341,10 @@ def get_argument_parser() -> ArgumentParser:
"Default: write to standard output")
group.add_argument("--fasta", default=False, action='store_true',
help="Output FASTA to standard output even on FASTQ input.")
group.add_argument("--compression-level", type=int, default=1, metavar="N",
help="Compression level for compressed output files. Default: %(default)s")
group.add_argument("-Z", action="store_const", const=1, dest="compression_level",
help="Use compression level 1 for gzipped output files (faster, but uses more space)")
help="DEPRECATED because compression level 1 is now the default.")
group.add_argument("--info-file", metavar="FILE",
help="Write information about each read and its adapter matches into FILE. "
"See the documentation for the file format.")
Expand Down

0 comments on commit 8d67c7c

Please sign in to comment.