-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update controller-runtime to 0.16 #710
Conversation
7fdb260
to
e4b4ebc
Compare
This version of controller runtime has breaking changes. Refactored the Topology Controller tests because they were flaky. This group of tests was defining their own queue reconciler in the same manager that has all the controllers (included one for Queue types). This caused a "race" between all setup reconcilers, and the test passed when the controller registered in the specific test case was the first to reconcile, which seemed to happen often enough to hide the flake. After the refactor, each test case defines its own manager, which watches a dedicated namespace for the Topology Controller tests, and cancels/stops the manager at the end of each test case. This ensures that only one controller is reconciling the Queue object. The controllers defined in the BeforeSuite now watch only the default namespace. All our tests were using the default namespace anyway, so there are no failures as part of this subtle change. Signed-off-by: Aitor Perez Cedres <[email protected]>
Controllers now are scoped to a dedicated namespace. Controllers now are started and stopped for each test case. This provides isolation from other test suites, ensuring there's no environment pollution. We also bumped Kubernetes to 1.26, because this version had some changes regarding Pod Admission Controllers, and it will be the minimum supported Kubernetes version in the next minor release. Signed-off-by: Aitor Perez Cedres <[email protected]>
e4b4ebc
to
2f6e50e
Compare
Execution time has cut down from 10 minutes to 1 minute and a few seconds (!!!) The solution is to start an API server for each parallel process, listening on different ports. I tried to start only one API server and share the API information among other parallel processes, but it was not possible because the testEnv Environment does not implement any of the marshall interfaces, so it is not possible to binary encode it. Binary encoding is the method that Ginkgo uses to share information between processes in the Synchronized Before/After suites. Signed-off-by: Aitor Perez Cedres <[email protected]>
2f6e50e
to
cad37eb
Compare
The SuperStream test suite is still flaking. Looking into it 👀 |
Use UID to verify whether an object has changed is more reliable than a timestamp. A timestamp can be very close to each other if the recreate operation is fast. Signed-off-by: Aitor Perez Cedres <[email protected]>
This is ready for review after b8f7820 🥳 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good to me!
I just noticed that we are removing the vuln detection in the Makefile.
Do we want to remove it for a specific reason?
Ah, I removed it locally because it was complaining about my local Go version and not being very useful at all. I'll add it back to the unit-tests target. IMO it's unnecessary to run them at unit and at integration level. |
[skip ci] Signed-off-by: Aitor Perez Cedres <[email protected]>
Note to reviewers: remember to look at the commits in this PR and consider if they can be squashed
Note to contributors: remember to re-generate client set if there are any API changes
Summary Of Changes
Additional Context
This version of controller runtime has breaking changes. Refactored the
Topology Controller tests because they were flaky. This group of tests
was defining their own queue reconciler in the same manager that has all
the controllers (included one for Queue types). This caused a "race"
between all setup reconcilers, and the test passed when the controller
registered in the specific test case was the first to reconcile, which
seemed to happen often enough to hide the flake.
After the refactor, each test case defines its own manager, which
watches a dedicated namespace for the Topology Controller tests, and
cancels/stops the manager at the end of each test case. This ensures
that only one controller is reconciling the Queue object. The
controllers defined in the BeforeSuite now watch only the default
namespace. All our tests were using the default namespace anyway, so
there are no failures as part of this subtle change.