Pm 663 control dump scratch area location #5

cameron-simpson · 2021-05-17T23:25:41Z

Primarily let $PGDUMP_BACKUP_AREA override the default /pg_dump scratch area.

Other small changes.

…t time

… fails

…ackups instead of /pg_dump

aweakley · 2021-05-17T23:42:04Z

(I've made a new branch with this change merged in, so we can get the build to run etc.)

mrmachine · 2021-05-18T00:27:33Z

bin/backup.sh

@@ -4,6 +4,8 @@ set -e

 setup.sh

+max_pg_wait_count=120


Make this configurable via env var and default to 120?

mrmachine · 2021-05-18T00:30:53Z

bin/backup.sh

 			echo "Waiting for PostgreSQL to become available..."
 		fi
-		(( COUNT += 1 ))
+		(( count += 1 ))
+		[ $count -lt $max_pg_wait_count ] || break


Why [] here instead of [[]]? Can we stick to [[]], or even if [[ ... ]]; then ... fi (as above) for consistency.

Mostly because I pretty much never use [[ in my own scripts - its utility is generally tiny and it is less portable. I'll tweak this for consistency though.

mrmachine · 2021-05-18T00:40:42Z

bin/backup.sh

@@ -5,6 +5,7 @@ set -e
 setup.sh

 max_pg_wait_count=120
+work_area=${PGDUMP_BACKUP_AREA:-/pg_dump}


Why does the work area need to be configurable? Do you want to bind mount it from a different host volume, and are then unable to rm -rf the bind mounted directory?

Maybe we just rm /pg_dump/*.sql, instead? And then you can bind mount /pg_dump from anywhere, and we can also mkdir /pg_dump in Dockerfile and no longer need mkdir -p.

Because we're wanting this in a different volume, and force making this in / is a thing that only makes sense inside a container, and a pretty specific container at that. While this code is for a container, making it dependent on that is a pain point waiting to happen (as has just happened to us).

Making the path configurable lets you tune things without hand configuring the container build with a bind mount. Far better to have the script be somewhat generic by being able to tell it where to work.

One stong advantage of the mkdir is that we know we make the work and nobody else might be using it. Otherwise the rm -rf below, or your suggested rm *.sql are just asking to damage something else's work. With mkdir, we made it and we're free to unmake it.

I am not sure I follow. Are you using this script outside of a container? From what I have seen, it is fairly common for Docker images to store data in a subdirectory of /.

As far as I can tell, there is no point in making the directory configurable unless you are wanting to use a bind mount, which must be hand configured by you in your compose file or docker run command, and needing to completely remove and recreate that directory which would require you nominate a subdirectory of the mount point.

I think we could just use /data/pg_dump as the directory and you could then use -v /some/host/dir:/data and we could continue to destroy and recreate the pg_dump subdirectory, without this extra configuration or documentation.

Fair point about the safety of retaining the mkdir and exit if it already exists. Though, if something ever goes wrong and a container exits (e.g. OOM killed) before having removed the directory, the container will not be able to restart resume backups without intervention?

mrmachine · 2021-05-18T00:44:12Z

bin/backup.sh

-	mkdir -p "/pg_dump"
+	mkdir "$work_area" || exit 1


Can you elaborate on the problem with -p? If we do drop it, we don't need || exit 1 because we have set -e already.

The issue with -p is twofold:

it makes intermediate directories (that's its entire feature) - not an issue with a hardwired /pg_dump but definitely an issue with longer paths - with a longer path accidents in the longer path eg /tmpp/work_area do not get notice - badness just gets created in the filesystem.

if the directory already exists, it is not an error - this means that 2 instances of the script can both "make" the work area, both try to use in concurrently, and both try to blow it away. All those things can variously lead to corrupt/truncated backups or removed-during-processing backups. All bad. mkdir -p in a script is usually a bug magnet. By not having -p we are just asking to make a specific directory we want to be (a) new and (b) in a place where it is expected. -p discards these benefits.

Why might 2 scripts be running? If the backups take a very long time, something which creeps up on one for a fixed cronlike schedule. Forcing a mkdir is robust.

I like || exit to make the flow control explicit, -e regardless. I use both -e and -u in my own scripts, but still make an exit like the above explicit. It aids readability. Also, that's a fun side effect of subshells, where -e effectively gets turned off and needs reenabling. Explicit flow control reduces the opportunity for this kind of misadventure.

cameron-simpson · 2021-05-18T03:37:38Z

On 17May2021 19:33, Tai Lee ***@***.***> wrote: I am not sure I follow. Are you using this script outside of a container?

No, but why make it deliberately fragile in this way when we can just do the conventional thing of letting the script be told where to work? The day someone wants to test this or reuse it outside a container the same issue will just rearise.

From what I have seen, it is fairly common for Docker images to store data in a subdirectory of /.

So? I suppose if you know you're in a container you can get away with stuff like that, but there seem to be several "in Dcoker" things I've seen of which i remain of poor opinion. This is getting on that list. Containers look like standalone machines from the inside in most respects. Putting things in / is effectively putting things in the "vendor" filesystem area - if your container builds on some third party template container or if someone else expects to build on your container, that space is a place of conflict unless you stay out of it. Something like a work area for backouts ought to be off in /var somewhere, or a scratch area like /tmp or a (utility/service) user like /home/whatever. Putting things at the top is almost never a good idea. Having your script honour an environment variable lets you sidestep any such issues entirely with docker env settings without changing the container imgae your script came from.

As far as I can tell, there is no point in making the directory configurable unless you are wanting to use a bind mount, which must be hand configured by you in your compose file or docker run command, and needing to completely remove and recreate that directory which would require you nominate a subdirectory of the mount point. I think we could just use /data/pg_dump as the directory and you could then use -v /some/host/dir:/data and we could continue to destroy and recreate the pg_dump subdirectory, without this extra configuration or documentation.

That would be acceptable to my opinions.

Fair point about the safety of retaining the mkdir and exit if it already exists. Though, if something ever goes wrong and a container exits (e.g. OOM killed) before having removed the directory, the container will not be able to restart resume backups without intervention?

Correct, but at least you get an error message with an obvious cause rather than silent or obtuse failure/corruption depending on how things play out. It's fail early rather than fail during (or, wosrse, _not_ fail during but mangle the backups).

cameron-simpson · 2021-05-18T03:43:20Z

On 18May2021 13:37, Cameron Simpson ***@***.***> wrote: On 17May2021 19:33, Tai Lee ***@***.***> wrote: > I think we could just use /data/pg_dump as the directory and you > could then use -v /some/host/dir:/data and we could continue to destroy and > recreate the pg_dump subdirectory, without this extra configuration or > documentation. That would be acceptable to my opinions.

Actually, not quite. /data's pretty generic. The mount/bind should be per purpose, therefore something /data/pg_dump, with the work area being made within it. Otherwise there's no scope for separating work areas in /data among different purposes.

mrmachine · 2021-05-18T05:11:15Z

I think this debate is largely academic and idealogical, and comes down to personal preference, and how strictly we want to treat the container environment like a traditional Linux environment.

I think the core issue needing to be fixed is:

Make it possible to use a bind mount to store temporary pg_dump data, so we can take advantage of additional storage volumes on the host.

And the thing standing in the way of that is:

Our use of rm -rf /pg_dump to cleanup.

I think we have two possible solutions -- both work by avoiding our attempt to remove the data directory, which may now be a bind mounted volume:

Use rm /pg_dump/* to remove its contents only; or
Store *.sql files in a subdirectory like /pg_dump/data so you can bind mount /pg_dump and continue to rm -rf /pg_dump/data.

I am not fussed either way, between those two options.

I do not think we need to strictly only write to a directory that we have first created and then destroy it, as a kind of lock to protect against concurrent execution or accidental data loss.

We already use flock to prevent concurrent execution of the scheduled tasks. See: https://github.com/ixc/restic-pg-dump-docker/blob/master/crontab.tmpl
If anyone is bind mounting a shared directory into the container as /pg_dump they should be expect the contents of that directory to be managed by restic-pg-dump.

Plus, mkdir foo || exit 1 is not going to fail early/loud on misconfiguration at startup. It is going to fail silently in the middle of the night (after being OOM killed), unless you externally monitor the log for mkdir foo: File exists, or for a lack of successful backup confirmations). It will not cause the container to exit.

We can entirely avoid the need to mkdir at runtime with option 1, by creating the directory in Dockerfile and never removing it.

cameron-simpson added 4 commits May 18, 2021 09:09

setup.sh: report all the missing variables before aborting

8c21f91

backup.sh: downcase unexported variables, put a limit on the psql wai…

292491d

…t time

backup.sh: drop -p from mkdir (is prone to races), abort script if it…

1d59094

… fails

backup.sh: honour $PGDUMP_BACKUP_AREA to specify a scratch area for b…

2fcb593

…ackups instead of /pg_dump

aweakley requested a review from mrmachine May 17, 2021 23:35

mrmachine reviewed May 18, 2021

View reviewed changes

cameron-simpson added 2 commits May 18, 2021 12:00

backup.sh: make psql wait time configurable via $PGDUMP_BACKUP_WAIT_TIME

7c779ce

backup.sh: change compact [ ] into if-[[-then for consistency

dfbd82b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pm 663 control dump scratch area location #5

Pm 663 control dump scratch area location #5

cameron-simpson commented May 17, 2021

aweakley commented May 17, 2021

mrmachine May 18, 2021

mrmachine May 18, 2021

cameron-simpson May 18, 2021

mrmachine May 18, 2021

cameron-simpson May 18, 2021

mrmachine May 18, 2021

mrmachine May 18, 2021

cameron-simpson May 18, 2021

cameron-simpson commented May 18, 2021 via email

cameron-simpson commented May 18, 2021 via email

mrmachine commented May 18, 2021

Pm 663 control dump scratch area location #5

Are you sure you want to change the base?

Pm 663 control dump scratch area location #5

Conversation

cameron-simpson commented May 17, 2021

aweakley commented May 17, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cameron-simpson commented May 18, 2021 via email

cameron-simpson commented May 18, 2021 via email

mrmachine commented May 18, 2021