Visit the main page
There are a number of open-source toolkits with functionality similar to the TSV Utilities. Several are listed below:
- clarkgrubb/data-tools - A variety of tools, especially rich in format converters. Written in Python, Ruby, and C.
- csvkit - CSV tools, written in Python.
- csvtk - CSV tools, written in Go.
- GNU Datamash - Numeric, textual and statistical operations on TSV files. This tool has many similarities to tsv-summarize. Written in C.
- dplyr - Tools for tabular data in R storage formats. Runs in an R environment, code is in C++.
- miller - Tools for CSV, JSON, and other formats. written in C.
- GNU shuf, part of GNU Core Utils - Generates permutations of input lines. Sampling with and without replacement is supported. This tool has many of the same features as tsv-sample. Written in C.
- brendano/tsvutils - TSV tools, especially rich in format converters. Written in Python.
- xsv - CSV tools, written in Rust.
A much more comprehensive list of tools can be found here: Structured text tools.
The different toolkits are certainly worth investigating if you work with tabular data files. Several have quite extensive feature sets. Each toolkit has its own strengths, your workflow and preferences are likely to fit some toolkits better than others.
File format is perhaps the most important dimension. CSV files are very common. However, CSV files cannot be processed reliably by standard Unix tools. For this reason, CSV toolkit functionality typically extends into the space of traditional Unix tools. For example, CSV toolkits often have their own "sort" operation, as Unix sort
does not operate reliably on CSV files. This is unfortunate, as creating a program with the speed and quality of a program like GNU sort
is a meaningful undertaking.
Many CSV toolkits also support TSV files, certainly appealing. Unfortunately, usage can be complicated and error prone due to the need to specify record delimiters and CSV style escape rules. Another issue is that not all CSV toolkits support fully turning off CSV escape syntax. This is usually not obvious and can lead to subtle errors when processing TSV files containing quotes.
Tradeoffs between file formats is its own topic. Appropriate choice of format is often dependent on the specifics of the environment and tasks being performed. See Comparing TSV and CSV formats for a discussion of TSV and CSV formats. The brendano/tsvutils README (Brendan O'Conner) has a nice discussion of the rationale for using TSV files.