-
Notifications
You must be signed in to change notification settings - Fork 425
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Koordinator as one batch scheduler option #2572
base: master
Are you sure you want to change the base?
Support Koordinator as one batch scheduler option #2572
Conversation
Is this PR ready for review? The PR title is still marked as "draft." |
@kingeasternsun can you please add some tests before marking this ready for review? See other implementation for batch schedulers for reference |
Signed-off-by: kingeasternsun <[email protected]>
Thanks for your comment, It's still developing, and now it lacks test code. |
Thanks for your advice, I'll add the tests quickly |
Signed-off-by: kingeasternsun <[email protected]>
Hey, everything is fine now. |
Hi, tests had been added . |
} | ||
|
||
func analyzeGangGroupsFromApp(app *rayv1.RayCluster) ([]string, map[string]wokerGroupReplicas) { | ||
gangGroups := make([]string, 1+len(app.Spec.WorkerGroupSpecs)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: len(app.Spec.WorkerGroupSpecs)+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: len(app.Spec.WorkerGroupSpecs)+1
Thank you for your review! I will fix it as soon as possible.
} | ||
|
||
for i, workerGroupSepc := range app.Spec.WorkerGroupSpecs { | ||
gangGroups[1+i] = generateGangGroupName(app, workerGroupSepc.Template.Namespace, workerGroupSepc.GroupName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: gangGroups[i+1]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: gangGroups[i+1]
Thank you for your review! I will fix it as soon as possible.
for i, workerGroupSepc := range app.Spec.WorkerGroupSpecs { | ||
gangGroups[1+i] = generateGangGroupName(app, workerGroupSepc.Template.Namespace, workerGroupSepc.GroupName) | ||
minMemberMap[workerGroupSepc.GroupName] = wokerGroupReplicas{ | ||
Replicas: *(workerGroupSepc.Replicas), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the brackets are not needed here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the brackets are not needed here
Thank you for your review! I will fix it as soon as possible.
gangGroups[1+i] = generateGangGroupName(app, workerGroupSepc.Template.Namespace, workerGroupSepc.GroupName) | ||
minMemberMap[workerGroupSepc.GroupName] = wokerGroupReplicas{ | ||
Replicas: *(workerGroupSepc.Replicas), | ||
MinReplicas: *(workerGroupSepc.MinReplicas), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the brackets are not needed here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the brackets are not needed here
Thank you for your review! I will fix it as soon as possible.
logger := ctrl.LoggerFrom(ctx).WithName(SchedulerName) | ||
|
||
// when gang scheduling is enabled, extra annotations need to be added to all pods | ||
if y.isGangSchedulingEnabled(app) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if !y.isGangSchedulingEnabled(app) {
return
}
....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if !y.isGangSchedulingEnabled(app) { return } ....
Thank you for your review! I will fix it as soon as possible.
}, | ||
) | ||
|
||
setHeadPodNamespace(rayClusterWithGangScheduling, "ns0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this call needed? The namespace should be inherited from the RayCluster namespace right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this call needed? The namespace should be inherited from the RayCluster namespace right?
What you said is absolutely correct. However, to make this module's code more general, I considered that there might be scenarios in the future where the namespace for head pods or worker pods needs to be specifically designated. Therefore, I implemented compatibility here: if the namespace for the head pod or worker pod is empty, it will inherit the namespace of the rayCluster; otherwise, it will be processed according to the specified namespace.
Of course, if you think we don’t need to consider such special cases for now, I completely agree as well.
|
||
setHeadPodNamespace(rayClusterWithGangScheduling, "ns0") | ||
addWorkerPodSpec(rayClusterWithGangScheduling, "workergroup1", "ns1", 4, 2) | ||
addWorkerPodSpec(rayClusterWithGangScheduling, "workergroup2", "ns2", 5, 3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does Koordinator support having RayCluster pods spread across different namespaces? I don't think that's really supported with Kuberay
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does Koordinator support having RayCluster pods spread across different namespaces? I don't think that's really supported with Kuberay
As mentioned in the documentation at https://koordinator.sh/docs/designs/gang-scheduling/#annotation-way, the Koordinator scheduler supports highly flexible GangGroup scheduling requirements, including the ability to support different pods in a GangGroup belonging to different namespaces.
Of course, if you believe such special scheduling requirements are unlikely to occur in KubeRay, I fully agree as well.
Can you add an example YAML similar to this one https://github.com/ray-project/kuberay/blob/e9f31556c14fae6391fb27a4a96bfbe01f917d46/ray-operator/config/samples/ray-cluster.yunikorn-scheduler.yaml? |
Please update main.go Lines 96 to 97 in e9f3155
|
Please update batch schedulers in helm chart kuberay/helm-chart/kuberay-operator/values.yaml Lines 60 to 87 in e9f3155
|
Thank you for your review! I will add it as soon as possible. |
Thank you for your review! I will fix it as soon as possible. |
Thank you for your review! I will fix it as soon as possible. |
Signed-off-by: kingeasternsun <[email protected]>
Why are these changes needed?
Koordinator is a QoS-based scheduling for efficient orchestration of microservices, AI, and big data workloads on Kubernetes. It aims to improve the runtime efficiency and reliability of both latency sensitive workloads and batch jobs, simplify the complexity of resource-related configuration tuning, and increase pod deployment density to improve resource utilizations.
The integration is easy, Koordinator support annotation way to support gang scheduling without podgroup CR
Koordinator are compatible with
pod-group.scheduling.sigs.k8s.io
,pod-group.scheduling.sigs.k8s.io/name
andpod-group.scheduling.sigs.k8s.io/min-available
in community.https://koordinator.sh/docs/designs/gang-scheduling
Related issue number
#2573
Checks