In case of a database corruption, crash or any other issue requiring us to restore the database to a previous state, the OpenShift PostgresCluster
object allows us to restore the database from any point in time prior to the incident.
Prerequisites:
- The ENVIRONMENT (dev,test,prod) and CIF_NAMESPACE_PREFIX, GGIRCS_NAMESPACE_PREFIX, CIIP_NAMESPACE_PREFIX environment variables are set
- The machine is logged in onto the cluster where the CIF app is deployed (
oc login ...
)
Restoring the DB:
To restore the database to any point in time, simply run:
make restore TARGET_TIMESTAMP="2022-05-24 14:20:00-07"
where TARGET_TIMESTAMP follows the format 'YYYY-MM-DD HH:MM:SS-ZZ'
This will create a restore job with the postgres-operator.crunchydata.com/pgbackrest-restore
label.
Once this job is completed, the restore operation is completed, and the application can be redeployed to its original state.
db-pre-upgrade
job that might erase the database and revert the effects of the restore operation.
make install
This section goes in depth with how to manually restore the database, all of the following is automated in the above make
target.
To allow restoring with the helm chart, the bucket provisioning and database pre-upgrade jobs have been disabled directly in the chart when the restore option is enabled. A few more values need to be adjusted:
To avoid running some jobs and airflow DAGs:
# chart/cas-cif/values.yaml
deploy-db:
enabled: false
download-dags:
enabled: false
To enable the restore option,
# chart/cas-cif/values.yaml
db:
restore:
enabled: false
targetTimestamp: "2022-06-06 13:00:00-07" # or the point in time at which to restore
Finally, to deploy the chart and restore the db,
helm upgrade <Deployment Name> chart/cas-cif/
The full documentation is available here