Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pm 663 control dump scratch area location #5

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

cameron-simpson
Copy link

Primarily let $PGDUMP_BACKUP_AREA override the default /pg_dump scratch area.

Other small changes.

@aweakley aweakley requested a review from mrmachine May 17, 2021 23:35
@aweakley
Copy link
Member

(I've made a new branch with this change merged in, so we can get the build to run etc.)

bin/backup.sh Outdated
@@ -4,6 +4,8 @@ set -e

setup.sh

max_pg_wait_count=120
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this configurable via env var and default to 120?

bin/backup.sh Outdated
echo "Waiting for PostgreSQL to become available..."
fi
(( COUNT += 1 ))
(( count += 1 ))
[ $count -lt $max_pg_wait_count ] || break
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why [] here instead of [[]]? Can we stick to [[]], or even if [[ ... ]]; then ... fi (as above) for consistency.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly because I pretty much never use [[ in my own scripts - its utility is generally tiny and it is less portable. I'll tweak this for consistency though.

@@ -5,6 +5,7 @@ set -e
setup.sh

max_pg_wait_count=120
work_area=${PGDUMP_BACKUP_AREA:-/pg_dump}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the work area need to be configurable? Do you want to bind mount it from a different host volume, and are then unable to rm -rf the bind mounted directory?

Maybe we just rm /pg_dump/*.sql, instead? And then you can bind mount /pg_dump from anywhere, and we can also mkdir /pg_dump in Dockerfile and no longer need mkdir -p.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we're wanting this in a different volume, and force making this in / is a thing that only makes sense inside a container, and a pretty specific container at that. While this code is for a container, making it dependent on that is a pain point waiting to happen (as has just happened to us).

Making the path configurable lets you tune things without hand configuring the container build with a bind mount. Far better to have the script be somewhat generic by being able to tell it where to work.

One stong advantage of the mkdir is that we know we make the work and nobody else might be using it. Otherwise the rm -rf below, or your suggested rm *.sql are just asking to damage something else's work. With mkdir, we made it and we're free to unmake it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I follow. Are you using this script outside of a container? From what I have seen, it is fairly common for Docker images to store data in a subdirectory of /.

As far as I can tell, there is no point in making the directory configurable unless you are wanting to use a bind mount, which must be hand configured by you in your compose file or docker run command, and needing to completely remove and recreate that directory which would require you nominate a subdirectory of the mount point.

I think we could just use /data/pg_dump as the directory and you could then use -v /some/host/dir:/data and we could continue to destroy and recreate the pg_dump subdirectory, without this extra configuration or documentation.

Fair point about the safety of retaining the mkdir and exit if it already exists. Though, if something ever goes wrong and a container exits (e.g. OOM killed) before having removed the directory, the container will not be able to restart resume backups without intervention?

Comment on lines -43 to +51
mkdir -p "/pg_dump"
mkdir "$work_area" || exit 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on the problem with -p? If we do drop it, we don't need || exit 1 because we have set -e already.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue with -p is twofold:

  • it makes intermediate directories (that's its entire feature) - not an issue with a hardwired /pg_dump but definitely an issue with longer paths - with a longer path accidents in the longer path eg /tmpp/work_area do not get notice - badness just gets created in the filesystem.
  • if the directory already exists, it is not an error - this means that 2 instances of the script can both "make" the work area, both try to use in concurrently, and both try to blow it away. All those things can variously lead to corrupt/truncated backups or removed-during-processing backups. All bad. mkdir -p in a script is usually a bug magnet. By not having -p we are just asking to make a specific directory we want to be (a) new and (b) in a place where it is expected. -p discards these benefits.

Why might 2 scripts be running? If the backups take a very long time, something which creeps up on one for a fixed cronlike schedule. Forcing a mkdir is robust.

I like || exit to make the flow control explicit, -e regardless. I use both -e and -u in my own scripts, but still make an exit like the above explicit. It aids readability. Also, that's a fun side effect of subshells, where -e effectively gets turned off and needs reenabling. Explicit flow control reduces the opportunity for this kind of misadventure.

@cameron-simpson
Copy link
Author

cameron-simpson commented May 18, 2021 via email

@cameron-simpson
Copy link
Author

cameron-simpson commented May 18, 2021 via email

@mrmachine
Copy link
Collaborator

I think this debate is largely academic and idealogical, and comes down to personal preference, and how strictly we want to treat the container environment like a traditional Linux environment.

I think the core issue needing to be fixed is:

  • Make it possible to use a bind mount to store temporary pg_dump data, so we can take advantage of additional storage volumes on the host.

And the thing standing in the way of that is:

  • Our use of rm -rf /pg_dump to cleanup.

I think we have two possible solutions -- both work by avoiding our attempt to remove the data directory, which may now be a bind mounted volume:

  1. Use rm /pg_dump/* to remove its contents only; or
  2. Store *.sql files in a subdirectory like /pg_dump/data so you can bind mount /pg_dump and continue to rm -rf /pg_dump/data.

I am not fussed either way, between those two options.

I do not think we need to strictly only write to a directory that we have first created and then destroy it, as a kind of lock to protect against concurrent execution or accidental data loss.

Plus, mkdir foo || exit 1 is not going to fail early/loud on misconfiguration at startup. It is going to fail silently in the middle of the night (after being OOM killed), unless you externally monitor the log for mkdir foo: File exists, or for a lack of successful backup confirmations). It will not cause the container to exit.

We can entirely avoid the need to mkdir at runtime with option 1, by creating the directory in Dockerfile and never removing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants