PostgreSQL 16 upgrade #2968

plasticviking · 2023-11-10T10:21:09Z

plasticviking
Nov 10, 2023
Maintainer

I’ve done a ton of research and collected my thoughts on our best possible upgrade path to a modern version of Postgres.

We’re currently using version 9.6 (which was EOL’d in 2022) both in the cluster and in docker. I’m also using 9.6 on a local bare-metal Debian server for compatibility/testing.

Upgrading to 16 brings a number of benefits (mostly totally transparent things like better parallel query performance), but also some great quality of life stuff like better/more mature JSON handling.

Upgrading, especially that many versions at a time, is not trivial, and we have three environments (Local, Docker, Openshift) and two scenarios (Upgrade-in-place, Fresh install) to consider, for 6 combinations.

The bundled (with Postgres-server) pg_upgrade tool works well. I used it on my local server to go direct from 9.6 to 16. It did choke on one invalid function definition, so it’s necessary to do it like this:

a) install both versions in parallel
b) connect to 9.6 and drop function public.convert_string_list_to_array_elements("unknown”);
c) shutdown both postmasters
d) run pg_upgrade
e) startup the new service, keep the old one offline as a backup.
f) run the generated update_extensions.sql file pg_upgrade makes for you

Works great locally. Won’t work in the cluster without serious docker-image manipulation, since the filesystem is read-only. I think the way to handle this in-cluster is:

Do it in dev and test first, prod after validation…

a) shut off the API server so no data is being modified
b) modify the openshift service definition for the database to point to nothing (delete the selector)
c) shell into the pod and run a pg_dump of prod data
d) download to local and upload it into a fresh, running local 9.6 service
e) do the above steps to upgrade the database
f) pg_dump again and re-upload into a new instance running version 16 (which may optionally be a CrunchyDB cluster)
g) setup the service definition to point to the new pods with the same name as previously. It should be transparent to the API pod.
h) validate everything by testing the app. If something doesn’t look right, revert step (b) to get the old service back

This doesn’t handle local docker upgrades. To be honest I think the best approach for local docker is to upgrade the image definitions and delete old data volumes and make a clean break. If that’s not possible, something very similar to what was done for the cluster upgrade will work as well. I can be available to generate these upgraded dumps for anyone who really needs it.

Speaking of fresh installs:

Our migrations are totally broken. They cannot run successfully on a fresh install now. I don’t know if anyone has tried recently. They’ll blow up on 0029, and then again on a bunch of others if you delete the offending definitions (which are view definitions referencing tables or views that don’t exist). I tried untangling it, and there are dependencies on certain seeds being run at certain times in the process, but I got into a real headache.

They don’t work on either 9.6 or 16 right now for totally clean install. I have a fix to suggest here too: Assuming that prod is the data that is closest to what is required (I know it’s at migration 0075 and latest is 0076, but that’s easily bumped up), I propose that we freeze the creation of new migrations while we do the database upgrade, and shortly after upgrading prod we take it as the "gold standard", do a pg-dump —schema-only into a new migration 0001_fresh_start to catch the structure, and create a seed with the /current/ values of the code tables (and other necessary ancillary tables like email_templates). Have the migration run the seed (you can do this with knex programmatically). Truncate the migration table in dev/test/prod and manually enter 0001_fresh_start as having been run. Do the same for any other environment that has data (local and docker instances). The end result of this unpleasant task is:

a) One migration and one seed, both of which are known working on Postgres 16 and have the structure we actually need
b) The ability to do fresh installs
c) We can clean up some of the things that are broken in our current migrations, like hard-coded schema/role names.

I propose to:

Over the long weekend, execute the steps to get dev and test pg_upgraded to version 16, without modifying any data
After the team has had a chance to evaluate those and confirm they are okay, schedule a time not too far in the future (next weekend?) to do the same for prod, maintaining the old service definitions as a way to get a quick flip back to the old system.
Do the migration squash after it’s been running along happily for a few days. Update the docker definitions and push code to remove all the references to the old version at the same time.
Bask in the warm glow of postgres 16 and working local/docker builds

micheal-w-wells · 2023-11-10T16:29:15Z

micheal-w-wells
Nov 10, 2023
Maintainer

I don't have a problem with doing one fresh migration to get past the upgrade, we did the same thing for knex at one point. I think on the shortlist after the upgrade we should look at making some better seeds for testing, which should save us from "can't build from scratch" states like the one you're in now.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PostgreSQL 16 upgrade #2968

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

PostgreSQL 16 upgrade #2968

plasticviking Nov 10, 2023 Maintainer

Replies: 1 comment

micheal-w-wells Nov 10, 2023 Maintainer

plasticviking
Nov 10, 2023
Maintainer

micheal-w-wells
Nov 10, 2023
Maintainer