Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pod connection issue reproducer #1564

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,8 @@ public void restartDuringPodLaunch() throws Throwable {
// the pod is created, but not connected yet
rjr.runRemotely(new AssertBuildLogMessage("Created Pod", build));
// restart
rjr.stopJenkins();
rjr.startJenkins();
//rjr.stopJenkins();
//rjr.startJenkins();
// update k8s to make a node suitable to schedule (add disktype=special to the node)
System.out.println("Adding label to node....");
try (KubernetesClient client = new KubernetesClientBuilder().build()) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,15 @@ podTemplate(yaml: '''
apiVersion: v1
kind: Pod
spec:
containers:
- name: jnlp
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 100m
memory: 256Mi
Comment on lines +5 to +13
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remove this, then the test passes O_o

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CPU limit would be throttling the pod and changing timing conditions. Maybe that is why.

Copy link
Member

@Vlatombe Vlatombe May 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the CPU throttling is just too much, and the agent never ends up as "online" on the controller side within the expected timeout despite establishing the remoting connection.

This may be an opportunity to profile the agent initialization (remote classloading, etc.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least Waiting for agent to connect could be reworded to emphasize that the controller might be waiting for the agent to be fully initialized and ready to use.

nodeSelector:
disktype: special
''') {
Expand Down