Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update controller-runtime to 0.16 #710

Merged
merged 5 commits into from
Dec 14, 2023
Merged

Update controller-runtime to 0.16 #710

merged 5 commits into from
Dec 14, 2023

Conversation

Zerpet
Copy link
Contributor

@Zerpet Zerpet commented Dec 1, 2023

Note to reviewers: remember to look at the commits in this PR and consider if they can be squashed
Note to contributors: remember to re-generate client set if there are any API changes

Summary Of Changes

  • Updated controller-runtime to 0.16.x
  • Refactored Topology Controller tests
  • Accepted changes suggested by GoLand code analysis tool
  • Make the integration test suite run in parallel

Additional Context

This version of controller runtime has breaking changes. Refactored the
Topology Controller tests because they were flaky. This group of tests
was defining their own queue reconciler in the same manager that has all
the controllers (included one for Queue types). This caused a "race"
between all setup reconcilers, and the test passed when the controller
registered in the specific test case was the first to reconcile, which
seemed to happen often enough to hide the flake.

After the refactor, each test case defines its own manager, which
watches a dedicated namespace for the Topology Controller tests, and
cancels/stops the manager at the end of each test case. This ensures
that only one controller is reconciling the Queue object. The
controllers defined in the BeforeSuite now watch only the default
namespace. All our tests were using the default namespace anyway, so
there are no failures as part of this subtle change.

@Zerpet Zerpet added this to the v1.13.0 milestone Dec 1, 2023
@Zerpet Zerpet self-assigned this Dec 1, 2023
@Zerpet Zerpet marked this pull request as draft December 4, 2023 16:37
@Zerpet Zerpet force-pushed the update-controller-runtime branch from 7fdb260 to e4b4ebc Compare December 11, 2023 17:06
@Zerpet Zerpet marked this pull request as ready for review December 11, 2023 17:11
This version of controller runtime has breaking changes. Refactored the
Topology Controller tests because they were flaky. This group of tests
was defining their own queue reconciler in the same manager that has all
the controllers (included one for Queue types). This caused a "race"
between all setup reconcilers, and the test passed when the controller
registered in the specific test case was the first to reconcile, which
seemed to happen often enough to hide the flake.

After the refactor, each test case defines its own manager, which
watches a dedicated namespace for the Topology Controller tests, and
cancels/stops the manager at the end of each test case. This ensures
that only one controller is reconciling the Queue object. The
controllers defined in the BeforeSuite now watch only the default
namespace. All our tests were using the default namespace anyway, so
there are no failures as part of this subtle change.

Signed-off-by: Aitor Perez Cedres <[email protected]>
Controllers now are scoped to a dedicated namespace. Controllers now are
started and stopped for each test case. This provides isolation from
other test suites, ensuring there's no environment pollution.

We also bumped Kubernetes to 1.26, because this version had some changes
regarding Pod Admission Controllers, and it will be the minimum
supported Kubernetes version in the next minor release.

Signed-off-by: Aitor Perez Cedres <[email protected]>
@Zerpet Zerpet force-pushed the update-controller-runtime branch from e4b4ebc to 2f6e50e Compare December 11, 2023 18:00
Execution time has cut down from 10 minutes to 1 minute and a few
seconds (!!!)

The solution is to start an API server for each parallel process,
listening on different ports. I tried to start only one API server and
share the API information among other parallel processes, but it was not
possible because the testEnv Environment does not implement any of the
marshall interfaces, so it is not possible to binary encode it. Binary
encoding is the method that Ginkgo uses to share information between
processes in the Synchronized Before/After suites.

Signed-off-by: Aitor Perez Cedres <[email protected]>
@Zerpet Zerpet force-pushed the update-controller-runtime branch from 2f6e50e to cad37eb Compare December 12, 2023 12:35
@Zerpet
Copy link
Contributor Author

Zerpet commented Dec 12, 2023

The SuperStream test suite is still flaking. Looking into it 👀

Use UID to verify whether an object has changed is more reliable than a
timestamp. A timestamp can be very close to each other if the recreate
operation is fast.

Signed-off-by: Aitor Perez Cedres <[email protected]>
@Zerpet
Copy link
Contributor Author

Zerpet commented Dec 13, 2023

This is ready for review after b8f7820 🥳

Copy link
Contributor

@DanielePalaia DanielePalaia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me!

I just noticed that we are removing the vuln detection in the Makefile.

Do we want to remove it for a specific reason?

@Zerpet
Copy link
Contributor Author

Zerpet commented Dec 14, 2023

Ah, I removed it locally because it was complaining about my local Go version and not being very useful at all. I'll add it back to the unit-tests target. IMO it's unnecessary to run them at unit and at integration level.

[skip ci]

Signed-off-by: Aitor Perez Cedres <[email protected]>
@Zerpet Zerpet merged commit 4093d3b into main Dec 14, 2023
@Zerpet Zerpet deleted the update-controller-runtime branch December 14, 2023 12:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants