Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add kep md #845

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

feat: add kep md #845

wants to merge 2 commits into from

Conversation

LY-today
Copy link

What would you like to be added?

What is your proposal:
The NodeResourcesFit plug-in of native k8s can only adopt a type of strategy for all resources, such as MostRequestedPriority and LeastRequestedPriority. However, in industrial practice, this design does not apply to some scenarios. For example: In AI scenarios, businesses that apply for GPUs prefer to occupy the entire GPU machine first to prevent GPU fragmentation; businesses that apply for CPU & MEM are prioritized and dispersed to non-GPU machines to prevent excessive consumption of CPU & MEM on GPU machines, resulting in real tasks of applying for GPUs. Pending due to insufficient non-GPU resources
. It is therefore hoped that both strategies can be extended to address this business need.

Why is this needed:
There are related descriptions above

Is there a suggested solution, if so, please add it:

plugin-one

config:

resources: 
  nvidia.com/gpu:
    type: MostAllocated
    weight: 2
  cpu:
    type: LeastAllocated
    weight: 1
  memory:
    type: LeastAllocated
    weight: 1

config description:
image

node score:

finalScoreNode = [(weight1 * resource1) + (weight2 * resource2) + … + (weightN* resourceN)] /(weight1+weight2+ … +weightN)

plugin-two

config:

resources: 
- nvidia.com/gpu 

config description:
image

node score:

finalScoreNode = (allocatablesResourcesNum - requestsResourcesNum) * framework.MaxNodeScore / allocatablesResourcesNum

Why is this needed?

It’s introduced above

Signed-off-by: LY-today <[email protected]>
@k8s-ci-robot
Copy link
Contributor

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. label Dec 24, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: LY-today
Once this PR has been reviewed and has the lgtm label, please assign huang-wei for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Dec 24, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @LY-today. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Dec 24, 2024
Copy link

netlify bot commented Dec 24, 2024

Deploy Preview for kubernetes-sigs-scheduler-plugins canceled.

Name Link
🔨 Latest commit 956d7f5
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-scheduler-plugins/deploys/676a63e8cf3a17000811db0a

@LY-today
Copy link
Author

@googs1025 KEP

Signed-off-by: LY-today <[email protected]>
@LY-today LY-today changed the title feat: add kep feat: add kep md Dec 24, 2024
@LY-today
Copy link
Author

@googs1025 KEP

@googs1025 Can you help advance this MR?

@googs1025
Copy link
Member

@googs1025 KEP

@googs1025 Can you help advance this MR?

Thanks for the invite, I'll handle this on weekend :)

@LY-today
Copy link
Author

@googs1025 KEP

@googs1025 Can you help advance this MR?

Thanks for the invite, I'll handle this on weekend :)

thank you for your time

@@ -0,0 +1,114 @@
# Node Resource Fit plus Scheduling
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems very similar to an old plugin. Can you help to tell the difference or integrate it?
FYI: https://github.com/kubernetes-sigs/scheduler-plugins/tree/master/kep/48-node-resources-allocatable-scoring

Copy link
Author

@LY-today LY-today Dec 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@googs1025 Older version policies can only use one policy for different resources. Not suitable for complex resource scenarios, such as AI

Copy link
Author

@LY-today LY-today Dec 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@googs1025 Under the AI ​​cluster. It is hoped that GPU tasks will be scheduled on one GPU machine as much as possible, and CPU tasks will be scattered on CPU machines. However, the old version of the policy does not support using different policies for the two resources.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned, it seems to be very similar to the previous nodeResourcesAllocatable, and I don't think it needs to be extended with a new plugin. If it is possible, can it be integrated into the original plugin? 🤔

Copy link
Author

@LY-today LY-today Dec 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@googs1025 What you mean is that you agree with the design of the NodeResourcesFitPlus strategy, but you want to implement it by modifying the original nodeResourcesAllocatable strategy?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@googs1025 What you mean is that you agree with the design of the NodeResourcesFitPlus strategy, but you want to implement it by modifying the original nodeResourcesAllocatable strategy?

@googs1025 Do I understand correctly?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not recommended to use screenshots of tables and pictures.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to make adjustments


## Summary

The NodeResourcesFit plug-in of native k8s can only adopt a type of strategy for all resources, such as MostRequestedPriority and LeastRequestedPriority. However, in industrial practice, this design does not apply to some scenarios. For example: In AI scenarios, businesses that apply for GPUs prefer to occupy the entire GPU machine first to prevent GPU fragmentation; businesses that apply for CPU & MEM are prioritized and dispersed to non-GPU machines to prevent excessive consumption of CPU & MEM on GPU machines, resulting in real tasks of applying for GPUs. Pending due to insufficient non-GPU resources
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the scenario of using gpu nodes, which are scarce resources, we should directly filter out the gpu nodes. Shouldn't this reduce the score? In addition, IIUC, gpu nodes (or other devices) are labeled (based on gpu-operator or nfd), and are generally filtered in this way.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Affinity strategies or nodeSelector require labeling nodes in advance, which is costly for cluster maintainers. The advantage of the strategy is to reduce this maintenance operation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think quite the opposite, we should have provided feature tags for device-specific nodes (eg: nvidia.com/gpu.xxx). 🤔

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understand, labels can indeed be printed to distinguish machine types. Of course, it can also be done using the Affinity strategy. But what I want to say is that the above process has costs at the industrial practice level. 100 heterogeneous resources require the maintenance cost of 100 sets of labels.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@googs1025 If you think that maintenance cost is not something that k8s needs to consider, then indeed the second expansion strategy does not need to be incorporated.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@googs1025 Does the ScarceResourceAvoidance strategy have a clear conclusion? Accept or not?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not for me to decide and can be left to other maintainers to suggest.

Copy link
Author

@LY-today LY-today Dec 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@googs1025 Thank you for your feedback. Can you help me let other students review it?

@LY-today
Copy link
Author

@googs1025 Hello, do you have any clear plans for these two plugins?

@LY-today
Copy link
Author

LY-today commented Jan 2, 2025

@swatisehgal @zwpaper Please check

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants