A utility to perform large updates or deletes in batches to improve performance. TRUNCATE TABLE
is obviously a faster way to purge an entire table, but in many cases, you have an enormous table, of which you need to remove or update a chunk. Copying the rows you want to keep (for a mass delete) can have all sorts of referential constraint issues.
After testing many different approaches, I've created this, which generates singleton updates or deletes of the rows in question. If you want to, you can output the generated SQL to a flat file and process it in some other way. If you use the -execute
option, batcher will use Go's internal concurrency to perform your mass update without having to worry about long transactions or the excruciating slowness of some databases when doing set operations.
No names, no packdrill.
pg-batch - Thank you!
Database | Version | Supported | CI Test Status | Notes |
---|---|---|---|---|
Cockroach | 20.1.3+ | Yes | 20.2.3 | Versions 19.0+ should work |
Informix | 12.10+ | No | No | Next on the list |
MariaDB | 10.5+ | Yes | 10.5 | |
MySQL | 8.0+ | Yes | 8.0.19 | Earlier (5.x) versions don't work |
Oracle | 12+ | No | No | After Informix |
PostgreSQL | 13.1+ | Yes | No | CI is an ongoing project |
SQLServer | 2019 | No | No | Linux will be first, then Windows (maybe!) |
Binaries for Mac, Linux and Windows, as well as the source code in zip and tar.gz can be found here.
If you want to build the source, I used Go 1.15.6.
On Mac, you can also:
brew tap SpokeyWheeler/tap
brew install batcher
If you're into Docker, you can:
docker run -it spokey/batcher:latest
$ ./batcher
'update', 'delete', 'version' or 'help' subcommand is required
flags:
-concurrency int
concurrency (default 20)
-database string
database name
-dbtype string
database type, e.g. postgres, informix, oracle, mysql (default "postgres")
-execute
execute the operation ('dry-run' only by default)
-host string
host name or IP (default "localhost")
-opts string
JDBC URL options (e.g. sslmode=disable) (default "sslmode=require")
-password string
password
-portnum string
port number (default "26257")
-set string
e.g. 'column_name=value, column_name=value ...' (ignored if provided with delete subcommand)
-table string
table name
-user string
user name
-verbose
provide detailed output (will output all statements to the screen)
-where string
e.g. 'column=value AND column IS NOT NULL ...'
This can seriously mess up your day if you get it wrong. Please dry run first to make sure the statement that will run is the one you want!
It turns out that SaaS CI across multiple databases is very, very hard! I've tried with CircleCI, GitHub Actions and Travis-CI, and the best I was able to manage was three out of the four I originally aimed for. (Only two are currently supported because I tried Travis last, I could only get two to work and now can't be bothered to go back!)
Must give massive props to CircleCI support - ridiculously prompt and really, really good.
I haven't given up on it, but I'm going to work on that separately because I'm sick of cluttering up this project with hundreds of meaningless CI-related commit messages and version bumps.
Also, from a CI perspective, Cockroach's "download a single binary and put it in your path" approach has been a delight compared to watching hours of APT output before you can test your change. It's not completely PostgreSQL compatible, but if you're considering a new project with lightweight requirements, it could be an interesting option. PostgreSQL itself isn't too bad, but it's still more work. MariaDB is not a drop-in replacement for MySQL, IMHO, because the process followed after an official APT install is different. Both MariaDB and MySQL need to sort out their character set and collation sequence issues, nobody has time for that shit.
My next stop will be to try Semaphore CI.
Semaphore CI is amazing: stunningly fast and I managed to get everything working!
- Commercial databases!
- Examples
- Trickle truncation?