Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Kubernetes cronjob] pg_isready only works interactively (pod user permissions maybe?) #361

Open
seano-vs opened this issue Aug 12, 2024 · 1 comment
Labels

Comments

@seano-vs
Copy link

Summary

TL;DR: pg_isready only seems to work when executed interactively in the pod, as opposed to when the pod is executed. This happens after I had to manually add the PGSSLMODE=require env variable because it was throwing a /root/.postgresql/postgresql.crt: Permission denied error.

Steps to reproduce

What I did was I:

  • Created the k8s job
  • Followed the steps here
  • After just setting the env variables listed above (MODE=MANUAL, MANUAL_RUN_FOREVER=FALSE, CONTAINER_ENABLE_SCHEDULING, and CONTAINER_ENABLE_MONITORING), it wouldn't start on its own (was waiting on user input).
  • Adding /etc/services.available/10-db-backup/run to the "command" field of the container definition resulted in a "no such file or directory" error.
  • I took inspiration from this comment which had me add the ['/init', 'backup-now'] commands which worked.
  • At that point, with all the envs loaded, I was getting a failed connection to my postgres server with /root/.postgresql/postgresql.crt: Permission denied being cited as the issue.
  • I added PGSSLMODE=require to the env list, and the error went away.
  • Now, I am facing the issue where pg_isready won't see that the server is ready. In order to debug, I grabbed the command that was being executed in the debug logs and ran it interactively in the pod with pg_isready --host=$DB01_HOST --port=$DB01_PORT --dbname=$DB01_NAME --username=$DB01_USER and it worked perfectly.

I suspect that this is a permissions issue with how the commands are being executed, but I'm not entirely sure.

I have the following k8s config:

apiVersion: batch/v1
kind: CronJob
metadata:
 name: postgres-storage-backup
 namespace: mastodon
spec:
 schedule: "30 1 * * *"
 concurrencyPolicy: Forbid
 suspend: false
 successfulJobsHistoryLimit: 1
 failedJobsHistoryLimit: 1
 jobTemplate:
   spec:
     template:
       metadata:
         name: postgres-storage-backup
       spec:
         volumes:
           - name: postgres-completion
             configMap:
               name: postgres-completion
               defaultMode: 0500
         containers:
           - name: postgres-storage-backup
             image: tiredofit/db-backup:4.1.3
             imagePullPolicy: IfNotPresent
             command:
               - /init
               - backup-now
             volumeMounts:
               - name: postgres-completion
                 mountPath: "/script"
             env:
               - name: DEBUG_MODE
                 value: "TRUE"
               - name: PGSSLMODE
                 value: "require"
               - name: MODE
                 value: "MANUAL"
               - name: MANUAL_RUN_FOREVER
                 value: "FALSE"
               - name: CONTAINER_ENABLE_SCHEDULING
                 value: "FALSE"
               - name: CONTAINER_ENABLE_MONITORING
                 value: "FALSE"
               - name: DEFAULT_POST_SCRIPT
                 value: "/script/postgres.sh"
               - name: DEFAULT_BACKUP_LOCATION
                 value: 'S3'
               - name: DEFAULT_S3_BUCKET
                 valueFrom:
                   configMapKeyRef:
                     name: storage-backup
                     key: postgres_bucket
               - name: DEFAULT_S3_KEY_ID
                 valueFrom:
                   configMapKeyRef:
                     name: storage-backup
                     key: DEFAULT_S3_KEY_ID
               - name: DEFAULT_S3_KEY_SECRET
                 valueFrom:
                   configMapKeyRef:
                     name: storage-backup
                     key: DEFAULT_S3_KEY_SECRET
               - name: DEFAULT_S3_REGION
                 valueFrom:
                   configMapKeyRef:
                     name: storage-backup
                     key: DEFAULT_S3_REGION
               - name: DEFAULT_S3_HOST
                 valueFrom:
                   configMapKeyRef:
                     name: storage-backup
                     key: DEFAULT_S3_HOST
               - name: DB01_TYPE
                 value: "pgsql"
               - name: DB01_HOST
                 valueFrom:
                   configMapKeyRef:
                     name: mastodon-env-tf
                     key: DB_HOST
               - name: DB01_PORT
                 valueFrom:
                   configMapKeyRef:
                     name: mastodon-env-tf
                     key: DB_PORT
               - name: DB01_NAME 
                 valueFrom:
                   configMapKeyRef:
                     name: mastodon-env-tf
                     key: DB_NAME
               - name: DB01_USER
                 valueFrom:
                   configMapKeyRef:
                     name: mastodon-env-tf
                     key: DB_USER
               - name: DB01_PASS 
                 valueFrom:
                   configMapKeyRef:
                     name: mastodon-env-tf
                     key: DB_PASS
         restartPolicy: OnFailure
 successfulJobsHistoryLimit: 1
 failedJobsHistoryLimit: 1

What is the expected correct behavior?

pg_isready sees that the server is up and backs it up

Relevant logs and/or screenshots

I've attached the debug logs with everything sensitive scrubbed: private-logs.txt

Environment

  • Image version / tag: tiredofit/db-backup:4.1.3
  • Host OS: k8s 1.30.2-do.0

Possible fixes

I've spent a fair amount of time debugging this, so I felt like there was just a point where it would be best to track my progress with a bug open

@seano-vs seano-vs added the bug label Aug 12, 2024
@seano-vs
Copy link
Author

update: I see that the path listed in this part of the readme is incorrect, and it should be /etc/services.available/dbbackup-01/run instead of /etc/services.available/10-db-backup/run. At least, that's what is in my docker image.

It's now able to execute "run" as was mentioned in the docs, but it's still stuck at the pg_isready not working in an automated fashion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant