-
Notifications
You must be signed in to change notification settings - Fork 540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
migrate nrt to ctrl runtime client #655
migrate nrt to ctrl runtime client #655
Conversation
✅ Deploy Preview for kubernetes-sigs-scheduler-plugins canceled.
|
/assign |
c3ee7ed
to
723706e
Compare
0524bcf
to
d447dfc
Compare
@Huang-Wei CI passed, PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some nits. LGTM overall.
/approve
/lgtm
podLister: podLister, | ||
resyncMethod: resyncMethod, | ||
} | ||
return obj, nil | ||
} | ||
|
||
func (ov *OverReserve) GetCachedNRTCopy(nodeName string, pod *corev1.Pod) (*topologyv1alpha2.NodeResourceTopology, bool) { | ||
func (ov *OverReserve) GetCachedNRTCopy(ctx context.Context, nodeName string, pod *corev1.Pod) (*topologyv1alpha2.NodeResourceTopology, bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems ctx is not used in this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, either pass it through ov.Get
(which will become ov.client.Get
?) or discard it please (applies to similar places elsewhere in the code)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is because we use the ctx here: https://github.com/kubernetes-sigs/scheduler-plugins/pull/655/files#diff-d469fc389aadc64511b0fcbf7af645119b04c14a1cc18f5d2307755df1c41a2eR57
and this function is part of the cache Interface here: https://github.com/kubernetes-sigs/scheduler-plugins/pull/655/files#diff-4792874d567196ef3ce188f2f3ccda73d46457d99d95111e04a0072dcf183375R35
so we have to change it to both DiscardReserved and OverReserve
BTW, Passthrough also used the ctx in GetCachedNRTCopy: https://github.com/kubernetes-sigs/scheduler-plugins/pull/655/files#diff-5747013d93a36239aaa4d659c8de3b9a2ef08bbd0f842377c25a1e810007b3e6R44
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, my bad. It would be nice to change it as
func (ov *OverReserve) GetCachedNRTCopy(_ context.Context, nodeName string, pod *corev1.Pod) (*topologyv1alpha2.NodeResourceTopology, bool) { }
because ctx
is indeed not needed in the body, but this is borderline a nit and totally not blocking
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Huang-Wei, zwpaper The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold cc @ffromani @swatisehgal @Tal-or for awareness. The background is that scheduler-plugins start to use more CRDs as time goes, and the current way is to generate typed client for each CRD and leverage in the plugin codebase. The codegen part is annoying and raise the bar of using/understanding scheduler plugin. So the overall goal to turn to dynamic client by leverage controller runtime's utilities. cc-ing you is b/c it seems you're working on pushing NRT to upstream. Hope this work doesn't conflict with your work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR.
I left some notes from my side, and I would wait for @ffromani and @swatisehgal to review this PR as well.
@@ -43,32 +45,26 @@ const ( | |||
) | |||
|
|||
func initNodeTopologyInformer(tcfg *apiconfig.NodeResourceTopologyMatchArgs, handle framework.Handle) (nrtcache.Interface, error) { | |||
topoClient, err := topoclientset.NewForConfig(handle.KubeConfig()) | |||
scheme := runtime.NewScheme() | |||
utilruntime.Must(clientgoscheme.AddToScheme(scheme)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would handle this error instead of panic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this error in practice won't happen IMO. If we treat it as an error, @Tal-or do you mean we can find a way to proceed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why this commit (which seems to want to focus on test code) needs to add manage schemas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Huang-Wei Not sure if it's possible to proceed, but at least throwing a detailed error which bubble up and explained what happen, would be more user friendly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is how the controller runtime client can know what to do with the types, this is a common usage from kube-builder, which is one of the most famous libs using controller runtime.
you can check: https://book.kubebuilder.io/cronjob-tutorial/empty-main
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @Huang-Wei . The error is not recoverable, so it's fine to use utilruntime.Must
and fail badly in the (I reckon) exceptionally rare case on which AddToScheme fails.
The only thing which doesn't fully click to me is why we need to fix the schema in this precise code location. Disclosure: I didn't review the latest changes, so this is possibly already improved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ffromani raised a very good point. The AddToScheme logic should only happen once per its usage:
- for running scheduler, it's should in main's init() or equivalent place. @zwpaper I know this is generic code problem, but if you can add this logic in this PR, it'd be much appreciated 🙇♂️
- for UTs, it's fine to init in each UT as we treat each UT as a client
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Huang-Wei moved all scheme initial into init
func, PTAL
# rg -B 1 -g '!vendor/' -g '!*_test.go' -- utilruntime.Must
pkg/noderesourcetopology/plugin.go
45-func init() {
46: utilruntime.Must(clientgoscheme.AddToScheme(scheme))
47: utilruntime.Must(topologyv1alpha2.AddToScheme(scheme))
pkg/networkaware/topologicalsort/topologicalsort.go
45-func init() {
46: utilruntime.Must(clientgoscheme.AddToScheme(scheme))
47: utilruntime.Must(agv1alpha.AddToScheme(scheme))
pkg/networkaware/networkoverhead/networkoverhead.go
66-func init() {
67: utilruntime.Must(clientgoscheme.AddToScheme(scheme))
68-
69: utilruntime.Must(agv1alpha1.AddToScheme(scheme))
70: utilruntime.Must(ntv1alpha1.AddToScheme(scheme))
pkg/generated/clientset/versioned/scheme/register.go
54- v1.AddToGroupVersion(Scheme, schema.GroupVersion{Version: "v1"})
55: utilruntime.Must(AddToScheme(Scheme))
pkg/generated/clientset/versioned/fake/register.go
54- v1.AddToGroupVersion(scheme, schema.GroupVersion{Version: "v1"})
55: utilruntime.Must(AddToScheme(scheme))
cmd/controller/app/server.go
37-func init() {
38: utilruntime.Must(clientgoscheme.AddToScheme(scheme))
39-
40: utilruntime.Must(schedulingv1a1.AddToScheme(scheme))
apis/config/scheme/scheme.go
44-func AddToScheme(scheme *runtime.Scheme) {
45: utilruntime.Must(config.AddToScheme(scheme))
46: utilruntime.Must(v1.AddToScheme(scheme))
47: utilruntime.Must(v1beta2.AddToScheme(scheme))
48: utilruntime.Must(v1beta3.AddToScheme(scheme))
apis/scheduling/scheme/scheme.go
31-func AddToScheme(scheme *runtime.Scheme) {
32: utilruntime.Must(schedv1alpha1.AddToScheme(scheme
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there could be some impl details to iterate over, but design wise moving to controller-runtime seems OK. Doesn't help us in general in our journey towards k8s core merge (or its proposal, at least), but its' manageable.
What puzzles me more is if this approach is still compatible with #599 .
Note that #599 is not necessarily something which I want to merge, but still the problem it attacks is real, and we didn't reach a conclusion which side should carry a fix. In addition, it would be nice to have the option to workaround the issue also on scheduler side if needed.
podLister: podLister, | ||
resyncMethod: resyncMethod, | ||
} | ||
return obj, nil | ||
} | ||
|
||
func (ov *OverReserve) GetCachedNRTCopy(nodeName string, pod *corev1.Pod) (*topologyv1alpha2.NodeResourceTopology, bool) { | ||
func (ov *OverReserve) GetCachedNRTCopy(ctx context.Context, nodeName string, pod *corev1.Pod) (*topologyv1alpha2.NodeResourceTopology, bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, either pass it through ov.Get
(which will become ov.client.Get
?) or discard it please (applies to similar places elsewhere in the code)
) | ||
|
||
type Passthrough struct { | ||
lister listerv1alpha2.NodeResourceTopologyLister | ||
client ctrlclient.Client |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
likewise, we're not using embedding here so better be consistent in OverReserve
as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed the OverReserve
to private client to keep it consistent.
@@ -43,32 +45,26 @@ const ( | |||
) | |||
|
|||
func initNodeTopologyInformer(tcfg *apiconfig.NodeResourceTopologyMatchArgs, handle framework.Handle) (nrtcache.Interface, error) { | |||
topoClient, err := topoclientset.NewForConfig(handle.KubeConfig()) | |||
scheme := runtime.NewScheme() | |||
utilruntime.Must(clientgoscheme.AddToScheme(scheme)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why this commit (which seems to want to focus on test code) needs to add manage schemas
scheme := runtime.NewScheme() | ||
utilruntime.Must(clientgoscheme.AddToScheme(scheme)) | ||
utilruntime.Must(topologyv1alpha2.AddToScheme(scheme)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while this is probably OK, it seems wasteful to do each time. Can we do it once when test runs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the tests run independently, how can we init once and run cross all the tests? maybe we have to refactor to use a test.M?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I sort of agree with @zwpaper, in the perspective to making each UT hermetic.
scheme := runtime.NewScheme() | ||
utilruntime.Must(clientgoscheme.AddToScheme(scheme)) | ||
utilruntime.Must(topologyv1alpha2.AddToScheme(scheme)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
scheme := runtime.NewScheme() | ||
utilruntime.Must(clientgoscheme.AddToScheme(scheme)) | ||
utilruntime.Must(topologyv1alpha2.AddToScheme(scheme)) | ||
extClient, err := ctrlclient.New(globalKubeConfig, ctrlclient.Options{Scheme: scheme}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
c14580c
to
f4b63d2
Compare
Signed-off-by: Zhang Wei <[email protected]>
Signed-off-by: Zhang Wei <[email protected]>
f4b63d2
to
9471d79
Compare
9471d79
to
0b5755d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall.
A potential future improvement is to instantiate the scheme
once (in main.go) for non-test code, and then pass to all plugins that need it using a WithScheme option. But this doesn't block on this PR's merge.
@@ -28,7 +28,7 @@ import ( | |||
"k8s.io/apimachinery/pkg/runtime" | |||
"k8s.io/apimachinery/pkg/util/rand" | |||
utilruntime "k8s.io/apimachinery/pkg/util/runtime" | |||
"k8s.io/client-go/kubernetes/scheme" | |||
clientgoscheme "k8s.io/client-go/kubernetes/scheme" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: revert this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the scheme name is in conflict with the var scheme
for init schemes, changed to keep consistency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
(non-binding to let other tagged reviewers review)
podLister: podLister, | ||
resyncMethod: resyncMethod, | ||
} | ||
return obj, nil | ||
} | ||
|
||
func (ov *OverReserve) GetCachedNRTCopy(nodeName string, pod *corev1.Pod) (*topologyv1alpha2.NodeResourceTopology, bool) { | ||
func (ov *OverReserve) GetCachedNRTCopy(ctx context.Context, nodeName string, pod *corev1.Pod) (*topologyv1alpha2.NodeResourceTopology, bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, my bad. It would be nice to change it as
func (ov *OverReserve) GetCachedNRTCopy(_ context.Context, nodeName string, pod *corev1.Pod) (*topologyv1alpha2.NodeResourceTopology, bool) { }
because ctx
is indeed not needed in the body, but this is borderline a nit and totally not blocking
nrtCandidate, err := ov.nrtLister.Get(nodeName) | ||
if err != nil { | ||
nrtCandidate := &topologyv1alpha2.NodeResourceTopology{} | ||
if err := ov.client.Get(context.Background(), types.NamespacedName{Name: nodeName}, nrtCandidate); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should use the same context for all the Get
s - and I think we should (and we should also make it cancellable as further extension). But this change is better served in a followup PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, other PRs also have some leftovers. The general idea would be use context wherever you can:
- the chained calls in PreFitler/Filter can use the context that is provided by scheduler framework
- for calls outside PreFilter/Filter/..., we should instantiate one at the start of main() (or make some change in scheduler framework to expose it), and pass it to all plugins
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
created an issue to track this #664
/lgtm |
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #
part of #485
Special notes for your reviewer:
after deleting the generated vendor, CI found noderesourcetopology crd file changed, v1alpha1 version was deleted, I checked and found nowhere that used the v1alpha1, so I deleted the version.
now we only have noderesourcetopology v1alpha2
Does this PR introduce a user-facing change?