v1.1.16: Profile guided optimization; New sampling methods
NOTE: Pre-built binaries for this release are no longer available. Please use binaries from the latest release.
Changes in v1.1.16:
The main changes in this release are the use of Profile Guided Optimization (PGO) and the addition of new sampling methods in tsv-sample
.
Profile Guided Optimization - This is a follow-on to the Link Time Optimization work done in v1.1.15. It is based on LDC compiler support for LTO and PGO, including the ability to operate on the application code and the D standard libraries (druntime, phobos) together.
Profile Guided Optimization uses data collected from instrumented builds to better optimize executables. The tsv utilities build process has been updated to generate and use instrumentation for several of the tools. LTO and PGO builds are enabled by options passed to make
. The pre-built binaries available from the GitHub releases page are built with LTO and PGO, but they must enabled explicitly when building from source. See Building with Link Time Optimization and Profile Guided Optimization for details.
PGO results in material performance gains (10% or more) on csv2tsv
and tsv-summarize
, and smaller gains (2-5%) on several other tools. Considering LTO (v1.1.15) and PGO (v1.1.16) combined, performance gains on five of six measured benchmarks ranged from 8-45% on Linux, and 6-57% on MacOS. Three of the benchmarks saw gains greater than 25% on both platforms.
New sampling methods - Two sampling methods have been added to tsv-sample
. One is a simple stream sampling mode that selects a random portion of an input stream based on a sampling rate. Another is a form of sampling known as "distinct" sampling. This selects a random portion of records based on a key in the data. For example, if records contain an IP address, sampling to take all records from 1% of the unique IP addresses. See the tsv-sample reference for details.
Other changes
tsv-summarize
bug fix, incorrect headers on two operations.- Windows line ending detection when running on Unix platforms (Issue #96)
tsv-select
performance improvement: Avoid unnecessary memory allocation from std.array.join. A 5% performance improvement and less memory allocation.