Skip to content

v1.6.0 Release

Compare
Choose a tag to compare
@jondegenhardt jondegenhardt released this 28 Mar 04:29
v1.6.0
f9c0ef7

To download and unpack prebuilt binaries:

$ # Linux
$ curl -L https://github.com/eBay/tsv-utils/releases/download/v1.6.0/tsv-utils-v1.6.0_linux-x86_64_ldc2.tar.gz | tar xz

$ # MacOS
$ curl -L https://github.com/eBay/tsv-utils/releases/download/v1.6.0/tsv-utils-v1.6.0_osx-x86_64_ldc2.tar.gz | tar xz

Installation instructions are in the ReleasePackageReadme.txt file in the release package.

To be notified of new releases:

GitHub supports notification of new releases. Click the "Watch" button on the repository page and select "Releases Only".

Release 1.6.0 Changes:

  • Prebuilt binaries have been updated to use the latest LDC compiler (1.20.1).

  • tsv-select: New feature, the ability to exclude fields (PR #267).

    Fields to exclude are specified with the --e|exclude option. Examples:

    $ # Drop the first field, keep everything else.
    $ # Equivalent to `cut -f 2- file.tsv`
    $ tsv-select --exclude 1 file.tsv
    
    $ # Drop fields 3-10, keep everything else
    $ tsv-select --exclude 3-10 file.tsv
    

    See the tsv-select reference for more information.

  • New tool: tsv-split (PR #270)

    tsv-split is used to split one or more input files into multiple output files. There are three modes of operation:

    • Fixed number of lines per file (--l|lines-per-file NUM): Each input block of NUM lines is written to a new file. This is similar to the Unix split utility.

    • Random assignment (--n|num-files NUM): Each input line is written to a randomly selected output file. Random selection is from NUM files.

    • Random assignment by key (--n|num-files NUM, --k|key-fields FIELDS): Input lines are written to output files using fields as a key. Each unique key is randomly assigned to one of NUM output files. All lines with the same key are written to the same file.

    Examples:

    $ # Split a file into files of 10,000 lines each.
    $ tsv-split data.txt --lines-per-file 10000 --dir split_files
    
    $ # Split a file into 1000 files with lines randomly assigned.
    $ tsv-split data.txt --num-files 1000 --dir split_files
    
    # Randomly assign lines to 1000 files using field 3 as a key.
    $ tsv-split data.tsv --num-files 1000 -key-fields 3 --dir split_files
    

    See the tsv-split reference for more information.