-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ingress nginx scaling to max due to memory #12167
Comments
This issue is currently awaiting triage. If Ingress contributors determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
You can check the logs of the controller pods and hardcode the number of workers |
/kind support |
Can you observe your request traffic? Have you encountered more requests or are there many large requests? |
the controller pod logs have the data of the requests, but nothing specific with failures or OOM errors |
@tao12345666333 , we do not observe any abnormal traffic coming into the ingress layers. it looks to be regular traffic |
@longwuyuan , can you please elaborate a bit more on what needs to be done on hardcode the number of workers ? |
|
@longwuyuan , i see the below in the running ingress pod should this value be sufficient to continue I tried manually reducing this worker_processes to 8 on few nodes and observe that the memory consumption looked to be reduced. Please suggest |
There are a few things coming into play here. The static memory consumption of the Ingress NGINX Controller partially depends on your cluster size, so nodes and pods, and amount of Ingress resources. In the past I observed Ingress NGINX Controller pods to consume up to 4 GB of memory right after startup because the cluster contained both a lot of nodes/pods and around 2,500 Ingress resources. This memory consumption does still not take actual traffic into account and is a design flaw of our current implementation as the control plane consuming the memory for internal operations is in the same container as the data plane which is actually doing the heavy lifting. If you now use HPA to scale your deployment and would expect it to do so depending on actual load produced by traffic, you might hit your target average memory utilization just with static data produced by how your environment (again, number of nodes, pods and Ingresses influence this) looks like. This especially can become a problem when you start with resource and HPA settings for a smaller setup and then slowly grow to the before mentioned point. Is the actual memory consumption this big right after pod startup or does it grow with time? The former would confirm my assumption while the latter could be caused by a memory leak. For the former you will probably need to tweak your resource requests and/or HPA settings. Sadly we can not overcome this design flaw at the moment, but we are planning to split the controller into a control plane and a data plane in the future. For the latter I'd recommend you to update to the latest stable release of our controller first, if not already on it, and verify again. Regards |
@sivamalla42 since your graph shows increase started after 9/24, then you have no other choice but to first look at all other helpful graphs and co-relate them to the log messages timestamps. Idea is to know if memory increased for handling requests or not. |
@Gacko , |
Hey, sorry, I missed this information in your initial issue description. Well, at best you'd upgrade to v1.11.3. But it would be interesting to know if the memory consumption rises by time or is high from the very beginning. Regards |
@Gacko , the pods were consuming the memory over the time. When they are restarted, they were taking time to consume memory. but in case if we are adding more pods, they are right away starting to consume the memory. |
Hello,
This sounds like your cluster is just big and Ingress NGINX therefore consuming comparable lot static memory. v1.10.x is out of support. You can of course just use v1.10.5, but this is up to you. We cannot make recommendations about versions to use other than the latest stable one. Regards |
Are you using rate limits? |
nope, we have not set the limits. |
@Gacko ,
|
Sorry, but I don't understand how this connects to my recent questions. I was asking you to investigate the static resource consumption right after you started a pod without any load. This gives insights into how much memory the controller uses just for the bare cluster state. If you already exceed or are close to your target average memory utilization in idle mode, than you will need to increase the memory requests. As stated before: I know this is not perfect and we are targeting to solve this issue by splitting the controller into control plane and data plane. |
Hi All,
We observe a strange behaviour with the ingress-nginx pods in our production. We started observing the pods scaling to max due to memory usage.
EKS: 1.29
helm list -n ingress-nginx NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION ingress-nginx ingress-nginx 1 2024-05-01 11:27:32.802401 +0530 IST deployed ingress-nginx-4.8.3 1.9.4
Not sure why all of a sudden we started observing this behaviour. There is no clue on why it started and how to fix it
If we are increasing the pods, the memory is still getting consumed and pods are scaling up again.
Any help is very much appreciated.
Thanks
Siva
The text was updated successfully, but these errors were encountered: