-
Notifications
You must be signed in to change notification settings - Fork 476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update bootimage management enhancement #1698
base: master
Are you sure you want to change the base?
Update bootimage management enhancement #1698
Conversation
Add some timeline and enforcement options.
Skipping CI for Draft Pull Request. |
|
||
#### Enforcement options | ||
|
||
Some combination of the following mechanisms should be implemented to alert users, particularly non-machineset backed scaled environments. The options generally fall under proactive enforcement (require users to updated and acknowledge upon upgrading to a new version) vs. reactive enforcement (only fail when a non-compliant bootimage is being used to scale into the cluster). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"require users to updated and acknowledge upon upgrading to a new version" is still reactive, isn't it? The admin ack approach is proactive, so the admin is aware (and ideally can set things up ahead of time), before updating from 4.y to a 4.(y+1) that would require a newer boot image. Can we change "updated" to "update" and "upon" to "before" here?
4. Add a service to be shipped via RHCOS/MCO templates, which will do a check on incoming OS container image vs currently booted RHCOS version. This runs on firstboot right after the MCD pulls the new image, and will prevent the node to rebase to the updated image if the drift is too far. | ||
|
||
|
||
RHEL major versions will no longer be cross-compatible. i.e. if you wish to have a RHEL10 machineconfigpool, you must use a RHEL10 bootimage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This creates a skew issue, right? E.g.
- Cluster is happy on an OCP that likes RHEL 9.
- ClusterVersion update requested for a new release that likes RHEL 10.
- MCO sets the MachineConfigPools up for a transition from the RHEL 9 target to the RHEL 10 target.
- Up through this point, MCO will target RHEL 9, and if you have a RHEL 10 boot image, the MCP will fail to scale.
- The first node being updated in the MCP successfully goes
Ready=True
(with disk space, and the other things that MCPs watch to decide the node is happy) on RHEL 10. - From this point on, MCO will target RHEL 10 for new nodes scaling into this MCP, and if you have a RHEL 9 boot image, the MCP will fail to scale.
Trying to time the bootimage bump to exactly match the "MCP associated with these MachineSets has decided new nodes will be RHEL 10" seems tricky. But to avoid that timing issue, you'd either need RHEL 10 targets to be more flexible about boot image matching (e.g. both RHEL 9 and RHEL 10 boot images would work with RHEL 10 MCPs), or some way to select from RHEL 9 or RHEL 10 boot images at Machine-creation time depending on what the target MCP was expecting (and even then, there would still be a race if you selected a RHEL 9 boot image but the MCP got a happy RHEL 10 node before the new RHEL 9 Machine made it's MCS Ignition request).
Inactive enhancement proposals go stale after 28d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle stale |
Stale enhancement proposals rot after 7d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle rotten |
Rotten enhancement proposals close after 7d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Reopen the proposal by commenting /close |
@openshift-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/reopen |
@yuqi-zhang: Reopened this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
@yuqi-zhang: The In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Add some timeline and enforcement options.
cc @dustymabe @jlebon