-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pm 663 control dump scratch area location #5
base: master
Are you sure you want to change the base?
Pm 663 control dump scratch area location #5
Conversation
…ackups instead of /pg_dump
(I've made a new branch with this change merged in, so we can get the build to run etc.) |
bin/backup.sh
Outdated
@@ -4,6 +4,8 @@ set -e | |||
|
|||
setup.sh | |||
|
|||
max_pg_wait_count=120 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make this configurable via env var and default to 120?
bin/backup.sh
Outdated
echo "Waiting for PostgreSQL to become available..." | ||
fi | ||
(( COUNT += 1 )) | ||
(( count += 1 )) | ||
[ $count -lt $max_pg_wait_count ] || break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why []
here instead of [[]]
? Can we stick to [[]]
, or even if [[ ... ]]; then ... fi
(as above) for consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly because I pretty much never use [[ in my own scripts - its utility is generally tiny and it is less portable. I'll tweak this for consistency though.
@@ -5,6 +5,7 @@ set -e | |||
setup.sh | |||
|
|||
max_pg_wait_count=120 | |||
work_area=${PGDUMP_BACKUP_AREA:-/pg_dump} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does the work area need to be configurable? Do you want to bind mount it from a different host volume, and are then unable to rm -rf
the bind mounted directory?
Maybe we just rm /pg_dump/*.sql
, instead? And then you can bind mount /pg_dump
from anywhere, and we can also mkdir /pg_dump
in Dockerfile
and no longer need mkdir -p
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we're wanting this in a different volume, and force making this in /
is a thing that only makes sense inside a container, and a pretty specific container at that. While this code is for a container, making it dependent on that is a pain point waiting to happen (as has just happened to us).
Making the path configurable lets you tune things without hand configuring the container build with a bind mount. Far better to have the script be somewhat generic by being able to tell it where to work.
One stong advantage of the mkdir is that we know we make the work and nobody else might be using it. Otherwise the rm -rf
below, or your suggested rm *.sql
are just asking to damage something else's work. With mkdir, we made it and we're free to unmake it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I follow. Are you using this script outside of a container? From what I have seen, it is fairly common for Docker images to store data in a subdirectory of /
.
As far as I can tell, there is no point in making the directory configurable unless you are wanting to use a bind mount, which must be hand configured by you in your compose file or docker run
command, and needing to completely remove and recreate that directory which would require you nominate a subdirectory of the mount point.
I think we could just use /data/pg_dump
as the directory and you could then use -v /some/host/dir:/data
and we could continue to destroy and recreate the pg_dump
subdirectory, without this extra configuration or documentation.
Fair point about the safety of retaining the mkdir and exit if it already exists. Though, if something ever goes wrong and a container exits (e.g. OOM killed) before having removed the directory, the container will not be able to restart resume backups without intervention?
mkdir -p "/pg_dump" | ||
mkdir "$work_area" || exit 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate on the problem with -p
? If we do drop it, we don't need || exit 1
because we have set -e
already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue with -p
is twofold:
- it makes intermediate directories (that's its entire feature) - not an issue with a hardwired
/pg_dump
but definitely an issue with longer paths - with a longer path accidents in the longer path eg/tmpp/work_area
do not get notice - badness just gets created in the filesystem. - if the directory already exists, it is not an error - this means that 2 instances of the script can both "make" the work area, both try to use in concurrently, and both try to blow it away. All those things can variously lead to corrupt/truncated backups or removed-during-processing backups. All bad.
mkdir -p
in a script is usually a bug magnet. By not having-p
we are just asking to make a specific directory we want to be (a) new and (b) in a place where it is expected.-p
discards these benefits.
Why might 2 scripts be running? If the backups take a very long time, something which creeps up on one for a fixed cronlike schedule. Forcing a mkdir is robust.
I like || exit
to make the flow control explicit, -e
regardless. I use both -e
and -u
in my own scripts, but still make an exit
like the above explicit. It aids readability. Also, that's a fun side effect of subshells, where -e
effectively gets turned off and needs reenabling. Explicit flow control reduces the opportunity for this kind of misadventure.
On 17May2021 19:33, Tai Lee ***@***.***> wrote:
I am not sure I follow. Are you using this script outside of a
container?
No, but why make it deliberately fragile in this way when we can just do
the conventional thing of letting the script be told where to work?
The day someone wants to test this or reuse it outside a container the
same issue will just rearise.
From what I have seen, it is fairly common for Docker images
to store data in a subdirectory of /.
So? I suppose if you know you're in a container you can get away with
stuff like that, but there seem to be several "in Dcoker" things I've
seen of which i remain of poor opinion. This is getting on that list.
Containers look like standalone machines from the inside in most
respects. Putting things in / is effectively putting things in the
"vendor" filesystem area - if your container builds on some third party
template container or if someone else expects to build on your
container, that space is a place of conflict unless you stay out of it.
Something like a work area for backouts ought to be off in /var
somewhere, or a scratch area like /tmp or a (utility/service) user like
/home/whatever. Putting things at the top is almost never a good idea.
Having your script honour an environment variable lets you sidestep any
such issues entirely with docker env settings without changing the
container imgae your script came from.
As far as I can tell, there is no point in making the directory
configurable unless you are wanting to use a bind mount, which must be
hand configured by you in your compose file or docker run command, and
needing to completely remove and recreate that directory which would
require you nominate a subdirectory of the mount point.
I think we could just use /data/pg_dump as the directory and you
could then use -v /some/host/dir:/data and we could continue to destroy and
recreate the pg_dump subdirectory, without this extra configuration or
documentation.
That would be acceptable to my opinions.
Fair point about the safety of retaining the mkdir and exit if it
already exists. Though, if something ever goes wrong and a container
exits (e.g. OOM killed) before having removed the directory, the
container will not be able to restart resume backups without
intervention?
Correct, but at least you get an error message with an obvious cause
rather than silent or obtuse failure/corruption depending on how things
play out. It's fail early rather than fail during (or, wosrse, _not_
fail during but mangle the backups).
|
On 18May2021 13:37, Cameron Simpson ***@***.***> wrote:
On 17May2021 19:33, Tai Lee ***@***.***> wrote:
> I think we could just use /data/pg_dump as the directory and you
> could then use -v /some/host/dir:/data and we could continue to destroy and
> recreate the pg_dump subdirectory, without this extra configuration or
> documentation.
That would be acceptable to my opinions.
Actually, not quite. /data's pretty generic. The mount/bind should be
per purpose, therefore something /data/pg_dump, with the work area being
made within it. Otherwise there's no scope for separating work areas in
/data among different purposes.
|
I think this debate is largely academic and idealogical, and comes down to personal preference, and how strictly we want to treat the container environment like a traditional Linux environment. I think the core issue needing to be fixed is:
And the thing standing in the way of that is:
I think we have two possible solutions -- both work by avoiding our attempt to remove the data directory, which may now be a bind mounted volume:
I am not fussed either way, between those two options. I do not think we need to strictly only write to a directory that we have first created and then destroy it, as a kind of lock to protect against concurrent execution or accidental data loss.
Plus, We can entirely avoid the need to |
Primarily let $PGDUMP_BACKUP_AREA override the default /pg_dump scratch area.
Other small changes.