-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: splitting the controller into multiple deployments #13517
Comments
UX complexity is significant -- Istio as an example of prior art of the oppositeI've thought about this before, but there is actually prior art in the opposite direction. Of particular note is Istio, which in its 1.5 release, specifically transitioned to a single binary ( Given my experience with Istio, other Istio users' experience, as well as the maintainers' experience in continuing to simplify the deployment (e.g. in 1.6 with SDS, agents, and all the way through to Ambient Mesh, which was beta in 1.22), I would suggest against this unless there is specific user demand for it. Note that k8s itself has been infamous for deployment complexity, and that has similarly reduced over the years. The amount of services/Charts people install per cluster has increased though, enough to be its own source of complexity -- this would add to that. Also note that this will substantially complicate UX when it comes to support or reporting issues. Our issue templates already don't handle the difference between Controller, Executor, and Server issues well, and this will multiplicatively complicate that. Similarly, users are not always aware of these different components and how they interact, so splitting into multiple more will make it more complicated than it already is for them as well. memory increase due to multiple informers watching the same resources
If you were to fully split things like TTL, archiving, and PodGC, there indeed would be duplicative informers as they watch the same resources. This could actually result in a substantial memory increase, since the Informers tend to be one of the largest proportions of memory. The memory increase is multiplicative with each separate Controller that watches the same set of resources. We already have users with very high memory usage (100GB+), so I don't think they would like to further increase that. k8s API issues are even more common, so taking care when increasing that usage is also important. For smaller installations, this and splitting into multiple Pods would also be significant, as the baseline of running a process, the Go runtime, and potentially duplicate informers, becomes higher. It would be better to split per Informer / per resource as such. That way the difference is more minimal for larger installations (instead of multiplicative), although could still be quite significant for smaller installations. can optimize memory usage of an informer, but would not recommend it unless absolutely necessaryCaveat: there is a way to optimize informer memory usage of the same resources but disjoint information within the same resource. I did research that quite a bit for my sharding proposal and then eventually found a way to do it and implemented a variant of that in #11855 (comment). It's not ideal though, and there is still higher read pressure on the k8s API and higher network usage. So I wouldn't necessarily suggest regularly using this method. In that PR it is primarily a workaround due to lack of labels on Semaphore ConfigMaps, so as a result the Informer is on all ConfigMaps, which could lead to very high memory usage without optimization. My suggestion in that PR was to require labels on all Argo-used ConfigMaps moving forward (we do for many of them, but not quite all of them, like Semaphores) to avoid this issue in future versions (it's a breaking change otherwise, so can't be done in a patch) supervisors are a can of worms
I would not recommend this either -- k8s and Docker do not lend well to multiple processes per image and it's a cause of lots of issues in the community -- I have literally seen a variant of this in every job since I started using k8s in 2017. Supervisors themselves are also their own can of worms -- see also Emissary as an example of one of those cans of worms. Multiple goroutines are also substantially more efficient than multiple processes. build complexity of multiple binaries + images
There is also the build complexity of multiple binaries and multiple images. The release pipeline is a bit complex as-is and does take upwards of an hour to run already, which is not good in its delayed feedback (and does occasionally cause release issues already as such). Splitting these and still producing binaries and images for multiple architectures will certainly increase this build time, and likely significantly, especially as we'd probably cap out on matrix parallelization and so some will run in serial. specific examples
I would probably say this is the best use-case, as it is a separate resource, can have high resource usage depending on the use-case, and is technically an optional resource too. Similar to CD's AppSet Controller.
Not the most intuitive, but since the controller can have replicas (hot standbys currently), a webhook actually would be able to scale 😅 Summary -- suggest separate files and packages primarily, rather than separate deployments
In summary, while this could sound like a codebase simplification via separation of concerns, it actually would add substantial complexity in several other areas, resulting in a net increase in complexity rather than decrease. Given our existing issues with complexity, this will increase those rather than decrease them. For the purpose of codebase simplification, I would just suggest continuing to split controllers into their own files and maybe packages, as both of us have suggested before (e.g. #13419 (comment)). That will get all of the codebase-specific benefits of this without any downsides. GH Markdown note: you can't use a short URL link within an in-line link -- this actually acted like an anchor tag that would link to a section of this issue, ending up linking to itself. I've edited this to include the full link |
Summary
Currently the workflow controller is a monolith with no sharding capabilities.
We would like to be able to shard it for scalability, but that is some way off. This proposal is not about sharding.
This is a discussion around splitting it into multiple controllers. For examples:
Scalability
This should, overall improve scalability. It would allow us to experiment with sharding in less risky and complex ways as most of these only run one off events unlike the main controller.
It may put more read pressure on the k8s API as we will be running separate informers. Their information doesn't overlap though, workflows transition between them.
Strategies
I'd like to discuss which strategy we take with splitting:
Single binary, multiple uses
Many projects, including ArgoCD release a single binary which have many personalities depending upon the name of the binary that is invoked.
An alternative would be that you could specify the roles that this controller would take on. A single deployment could take on all the roles, or it could be fully split, depending upon your needs.
Single binary and image per controller
As a software engineer, this is my preferred approach. The pain of getting there is higher, but separation of concerns become much stronger and the binaries should benefit from the splitting.
The downside is that either we also produce a single binary version for testing/small installations or we no-longer support the current deployment method of a single pod. This is mostly not of that much concern to many users who just install a manifest from kustomize or helm.
My choice
I would split into multiple binaries and images, and that would be the only deployment option. If we want to reverse this and deploy multiple controllers in one image in future we could do this via a supervisor daemon running all the binaries.
This would ensure separation of concerns, which I think is the big win here.
The text was updated successfully, but these errors were encountered: