Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate the impact of the auto-scale-down jobs #157

Closed
WadeBarnes opened this issue Jan 23, 2024 · 4 comments
Closed

Investigate the impact of the auto-scale-down jobs #157

WadeBarnes opened this issue Jan 23, 2024 · 4 comments
Assignees

Comments

@WadeBarnes
Copy link
Member

WadeBarnes commented Jan 23, 2024

Platform services has started running jobs that scale down any pods that have not been updated (rolled out) in over a year. These scripts will be run every Tuesday from now on.

The idea is to eliminate any abandoned projects and free the associated resources as well as attempt to encourage best practices around pod/application maintenance.

The best practice set forth is to rebuild and redeploy application pods at least once a month in order to pick updates and patches performed to the base image(s). This will have knock-on effects in some of our projects such as those dependent on aca-py images.

As a workaround, application pods can be rolled out, this updates the resource manifests to include the current date.

For now we want to review the pods that did get scaled down and identify what's needed to updated them. We also want to identify what other pods may have been scaled down since there are some services in the tools and deployment environments we don't activity monitor.

A separate ticket will be opened to discuss and design the update strategy moving forward.

@WadeBarnes WadeBarnes self-assigned this Jan 23, 2024
@WadeBarnes WadeBarnes converted this from a draft issue Jan 23, 2024
@WadeBarnes
Copy link
Member Author

WadeBarnes commented Jan 23, 2024

Summary:

Backup containers

  • Update to the latest version.

Databases

  • Short term, rebuild to pick up latest base images. Long term, upgrade databases to newer version of Postgres.
  • Many are S2I builds, which are used to pickup server configuration files.

S2I Builds

  • Short term, update the base containers where possible and rebuild the application images.
  • Determine the alternative to S2I base images. It appears RedHat is not supporting S2I containers to the same level anymore. There are Fedora based S2I images available here; https://quay.io/organization/fedora, but the log term plan should likely be to migrate away from using S2I base images.
  • My comments about Red Hat not supporting S2I images as much anymore is incorrect. They've just made it painfully difficult to find them (aka, their search feature is lacking in some areas); UBI based S2I base images can be found here, and Postgres S2I images can be found here. Note there is no Postgres 14 image from Red Hat. We've been using the one from Fedora which is built from the same source https://quay.io/repository/fedora/postgresql-14

Email verification services specifically

  • Short term redeploy existing services.
  • Short to medium term, retire and archive.

BC Registries FDW Database

  • Used for connection the COLIN databases.
  • Requires a fair amount of updating to use medium to long term.
  • Look for alternatives.
  • Short term, rebuild and deploy. Medium to long term, look for alternatives, or update.

Others

  • Including controller-buybc, aries-endorser-api, and issuer-admin-bcvcpilot
  • Update as indicated.
  • Rebuild and redeploy.

Details

Monitored Applications Affected:

  • e79518-dev (Digital Trust Services Trust Over IP)
    • controller-buybc
  • 4a9599-test (Digital Trust Shared Service)
    • aries-endorser-backup
    • aries-endorser-db
    • aries-endorser-api
    • aries-endorser-wallet
      • Uses aries-endorser-db image

Others Affected

  • e79518-test (Digital Trust Services Trust Over IP)
  • a99fd4-dev (Digital Trust Demo Apps)
  • a99fd4-test (Digital Trust Demo Apps)
  • 8ad0ea-dev (OrgBook BC)
  • 8ad0ea-test (OrgBook BC)
    • backup-bc
      • See backup-bc in 8ad0ea-dev
  • 7cba16-dev (BC Registries Agent)
    • event-processor-log-db-primary
      • Uses same image as wallet-primary
    • backup-primary
    • wallet-primary
    • event-db-primary
      • Uses same image as wallet-primary
  • 7cba16-test (BC Registries Agent)
    • wallet-primary
      • See wallet-primary in 7cba16-dev
    • event-processor-log-db-primary
      • Uses same image as wallet-primary
    • event-db-primary
      • Uses same image as wallet-primary
    • bc-reg-fdw-primary
    • backup-primary
      • See backup-primary in 7cba16-dev

@WadeBarnes
Copy link
Member Author

I've spun the application pods back up and reviewed the environments for any other containers that were spun down. Next step is to review and identify what can be done to update the affected application pods.

@WadeBarnes
Copy link
Member Author

Summary here; #157 (comment)

@WadeBarnes
Copy link
Member Author

Closing this. The investigation is complete. Addressing the issues is covered by #158

@github-project-automation github-project-automation bot moved this from In Progress to In Review in CDT Enterprise Apps Feb 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

1 participant