Update bootimage management enhancement #1698

yuqi-zhang · 2024-10-10T17:53:46Z

Add some timeline and enforcement options.

openshift-ci · 2024-10-10T17:53:50Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

wking · 2024-10-30T21:53:21Z

enhancements/machine-config/manage-boot-images.md

+
+#### Enforcement options
+
+Some combination of the following mechanisms should be implemented to alert users, particularly non-machineset backed scaled environments. The options generally fall under proactive enforcement (require users to updated and acknowledge upon upgrading to a new version) vs. reactive enforcement (only fail when a non-compliant bootimage is being used to scale into the cluster).


"require users to updated and acknowledge upon upgrading to a new version" is still reactive, isn't it? The admin ack approach is proactive, so the admin is aware (and ideally can set things up ahead of time), before updating from 4.y to a 4.(y+1) that would require a newer boot image. Can we change "updated" to "update" and "upon" to "before" here?

wking · 2024-10-30T22:03:32Z

enhancements/machine-config/manage-boot-images.md

+4. Add a service to be shipped via RHCOS/MCO templates, which will do a check on incoming OS container image vs currently booted RHCOS version. This runs on firstboot right after the MCD pulls the new image, and will prevent the node to rebase to the updated image if the drift is too far.
+
+
+RHEL major versions will no longer be cross-compatible. i.e. if you wish to have a RHEL10 machineconfigpool, you must use a RHEL10 bootimage.


This creates a skew issue, right? E.g.

Cluster is happy on an OCP that likes RHEL 9.

ClusterVersion update requested for a new release that likes RHEL 10.

MCO sets the MachineConfigPools up for a transition from the RHEL 9 target to the RHEL 10 target.

Up through this point, MCO will target RHEL 9, and if you have a RHEL 10 boot image, the MCP will fail to scale.

The first node being updated in the MCP successfully goes Ready=True (with disk space, and the other things that MCPs watch to decide the node is happy) on RHEL 10.

From this point on, MCO will target RHEL 10 for new nodes scaling into this MCP, and if you have a RHEL 9 boot image, the MCP will fail to scale.

Trying to time the bootimage bump to exactly match the "MCP associated with these MachineSets has decided new nodes will be RHEL 10" seems tricky. But to avoid that timing issue, you'd either need RHEL 10 targets to be more flexible about boot image matching (e.g. both RHEL 9 and RHEL 10 boot images would work with RHEL 10 MCPs), or some way to select from RHEL 9 or RHEL 10 boot images at Machine-creation time depending on what the target MCP was expecting (and even then, there would still be a race if you selected a RHEL 9 boot image but the MCP got a happy RHEL 10 node before the new RHEL 9 Machine made it's MCS Ignition request).

openshift-bot · 2024-11-28T01:15:56Z

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

openshift-bot · 2024-12-05T08:45:45Z

Stale enhancement proposals rot after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Rotten proposals close after an additional 7d of inactivity.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

openshift-bot · 2024-12-13T00:15:28Z

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

openshift-ci · 2024-12-13T00:15:45Z

@openshift-bot: Closed this PR.

In response to this:

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

yuqi-zhang · 2024-12-13T22:47:07Z

/reopen
/remove-lifecycle rotten
/lifecycle frozen

openshift-ci · 2024-12-13T22:47:20Z

@yuqi-zhang: Reopened this PR.

In response to this:

/reopen
/remove-lifecycle rotten
/lifecycle frozen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-ci · 2024-12-13T22:47:23Z

@yuqi-zhang: The lifecycle/frozen label cannot be applied to Pull Requests.

In response to this:

/reopen
/remove-lifecycle rotten
/lifecycle frozen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-ci · 2024-12-13T22:47:34Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from yuqi-zhang. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Update bootimage management enhancement

6aa7029

Add some timeline and enforcement options.

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 10, 2024

wking reviewed Oct 30, 2024

View reviewed changes

openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 28, 2024

openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 5, 2024

openshift-ci bot closed this Dec 13, 2024

openshift-ci bot reopened this Dec 13, 2024

openshift-ci bot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Dec 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update bootimage management enhancement #1698

Update bootimage management enhancement #1698

yuqi-zhang commented Oct 10, 2024

openshift-ci bot commented Oct 10, 2024

wking Oct 30, 2024 •

edited

Loading

wking Oct 30, 2024

openshift-bot commented Nov 28, 2024

openshift-bot commented Dec 5, 2024

openshift-bot commented Dec 13, 2024

openshift-ci bot commented Dec 13, 2024

yuqi-zhang commented Dec 13, 2024

openshift-ci bot commented Dec 13, 2024

openshift-ci bot commented Dec 13, 2024

openshift-ci bot commented Dec 13, 2024


		#### Enforcement options

		Some combination of the following mechanisms should be implemented to alert users, particularly non-machineset backed scaled environments. The options generally fall under proactive enforcement (require users to updated and acknowledge upon upgrading to a new version) vs. reactive enforcement (only fail when a non-compliant bootimage is being used to scale into the cluster).

		4. Add a service to be shipped via RHCOS/MCO templates, which will do a check on incoming OS container image vs currently booted RHCOS version. This runs on firstboot right after the MCD pulls the new image, and will prevent the node to rebase to the updated image if the drift is too far.


		RHEL major versions will no longer be cross-compatible. i.e. if you wish to have a RHEL10 machineconfigpool, you must use a RHEL10 bootimage.

Update bootimage management enhancement #1698

Are you sure you want to change the base?

Update bootimage management enhancement #1698

Conversation

yuqi-zhang commented Oct 10, 2024

openshift-ci bot commented Oct 10, 2024

wking Oct 30, 2024 • edited Loading

Choose a reason for hiding this comment

wking Oct 30, 2024

Choose a reason for hiding this comment

openshift-bot commented Nov 28, 2024

openshift-bot commented Dec 5, 2024

openshift-bot commented Dec 13, 2024

openshift-ci bot commented Dec 13, 2024

yuqi-zhang commented Dec 13, 2024

openshift-ci bot commented Dec 13, 2024

openshift-ci bot commented Dec 13, 2024

openshift-ci bot commented Dec 13, 2024

wking Oct 30, 2024 •

edited

Loading