-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pod connection issue reproducer #1564
base: master
Are you sure you want to change the base?
Conversation
For some reason I don't understand the pod jnlp container thinks it's connected but the controller thinks it's not and keeps waiting for it. `jnlp` container logs: ``` May 29, 2024 9:41:09 AM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir INFO: Using /home/jenkins/agent/remoting as a remoting work directory May 29, 2024 9:41:12 AM org.jenkinsci.remoting.engine.WorkDirManager setupLogging INFO: Both error and output logs will be printed to /home/jenkins/agent/remoting May 29, 2024 9:41:12 AM hudson.remoting.Launcher createEngine INFO: Setting up agent: test0-1-0jmg0-hp6gs-0hjx9 May 29, 2024 9:41:13 AM hudson.remoting.Engine startEngine INFO: Using Remoting version: 3248.v65ecb_254c298 May 29, 2024 9:41:13 AM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir INFO: Using /home/jenkins/agent/remoting as a remoting work directory May 29, 2024 9:41:21 AM hudson.remoting.Launcher$CuiListener status INFO: Locating server among [http://jenkins.kubernetes-plugin-test.svc.cluster.local:24289/jenkins/] May 29, 2024 9:41:23 AM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve INFO: Remoting server accepts the following protocols: [JNLP4-connect, Ping] May 29, 2024 9:41:23 AM hudson.remoting.Launcher$CuiListener status INFO: Agent discovery successful Agent address: jenkins.kubernetes-plugin-test.svc.cluster.local Agent port: 24290 Identity: 2f:8b:3e:9a:28:e7:d0:c5:24:25:18:72:60:8f:aa:b5 May 29, 2024 9:41:23 AM hudson.remoting.Launcher$CuiListener status INFO: Handshaking May 29, 2024 9:41:23 AM hudson.remoting.Launcher$CuiListener status INFO: Connecting to jenkins.kubernetes-plugin-test.svc.cluster.local:24290 May 29, 2024 9:41:24 AM hudson.remoting.Launcher$CuiListener status INFO: Server reports protocol JNLP4-connect-proxy not supported, skipping May 29, 2024 9:41:24 AM hudson.remoting.Launcher$CuiListener status INFO: Trying protocol: JNLP4-connect May 29, 2024 9:41:26 AM org.jenkinsci.remoting.protocol.impl.BIONetworkLayer$Reader run INFO: Waiting for ProtocolStack to start. May 29, 2024 9:41:31 AM hudson.remoting.Launcher$CuiListener status INFO: Remote identity confirmed: 2f:8b:3e:9a:28:e7:d0:c5:24:25:18:72:60:8f:aa:b5 May 29, 2024 9:41:32 AM hudson.remoting.Launcher$CuiListener status INFO: Connected ``` Controller logs: ``` 2024-05-29 09:41:00.737+0000 [id=128] INFO o.c.j.p.k.KubernetesLauncher#launch: Pod is running: kubernetes kubernetes-plugin-test/test0-1-0jmg0-hp6gs-0hjx9 2024-05-29 09:41:23.418+0000 [id=166] INFO h.TcpSlaveAgentListener$ConnectionHandler#run: Connection jenkinsci#2 from /10.244.0.14:36382 failed: null 2024-05-29 09:41:26.810+0000 [id=172] INFO h.TcpSlaveAgentListener$ConnectionHandler#run: Accepted JNLP4-connect connection jenkinsci#3 from /10.244.0.14:36398 2024-05-29 09:41:31.115+0000 [id=128] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (30/1,000): test0-1-0jmg0-hp6gs-0hjx9 2024-05-29 09:42:01.438+0000 [id=128] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (60/1,000): test0-1-0jmg0-hp6gs-0hjx9 ```
containers: | ||
- name: jnlp | ||
resources: | ||
requests: | ||
cpu: 100m | ||
memory: 256Mi | ||
limits: | ||
cpu: 100m | ||
memory: 256Mi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I remove this, then the test passes O_o
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CPU limit would be throttling the pod and changing timing conditions. Maybe that is why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the CPU throttling is just too much, and the agent never ends up as "online" on the controller side within the expected timeout despite establishing the remoting connection.
This may be an opportunity to profile the agent initialization (remote classloading, etc.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least Waiting for agent to connect could be reworded to emphasize that the controller might be waiting for the agent to be fully initialized and ready to use.
For some reason I don't understand the pod jnlp container thinks it's connected but the controller thinks it's not and keeps waiting for it.
To run the test I'm using:
jnlp
container logs:Controller logs:
Testing done
Submitter checklist