You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What Operating System are you using (both controller, and any agents involved in the problem)?
Linux in Fargate containers, using images based on jenkins/jenkins:alpine-jdk17 and jenkins/inbound-agent:jdk17.
Reproduction steps
Wait until no ECS Cloud agents are running
Start a job that needs a agent (main node does not run jobs)
Expected Results
A new Fargate Task for a new Agent node started up, does the Job.
Actual Results
A new Fargate Task is started by the ECS Cloud and runs the Job. Meanwhile a second, identical Fargate Task starts and does nothing as there are no other jobs to run.
After the idle period both are stopped.
Anything else?
I've tried adjusting various timeouts and other values such as poll times for Tasks, to no avail.
It seems like the Task is not considered started quickly enough to avoid Jenkins asking again for a provision?
There are two c.c.j.plugins.amazonecs.ECSCloud#provision: Asked to provision 1 agent(s) for: generic entries - here are what look like the important logs extracted from the full log included below:
2023-03-13 17:13:52.645+0000 [id=41] INFO c.c.j.plugins.amazonecs.ECSCloud#provision: Asked to provision 1 agent(s) for: generic
2023-03-13 17:14:43.191+0000 [id=6870] INFO h.TcpSlaveAgentListener$ConnectionHandler#run: Accepted JNLP4-connect connection #16 from /10.28.137.77:48718
2023-03-13 17:14:52.645+0000 [id=29] INFO c.c.j.plugins.amazonecs.ECSCloud#provision: Asked to provision 1 agent(s) for: generic
2023-03-13 17:15:18.407+0000 [id=6855] INFO c.c.j.p.amazonecs.ECSLauncher#launchECSTask: [fargate-agent-cloud-generic-211x7]: Task started, waiting for agent to become online
2023-03-13 17:15:18.407+0000 [id=6855] INFO c.c.j.p.amazonecs.ECSLauncher#waitForAgent: [fargate-agent-cloud-generic-211x7]: Agent connected
2023-03-13 17:15:18.459+0000 [id=6856] INFO c.c.j.p.amazonecs.ECSComputer#taskAccepted: [fargate-agent-cloud-generic-211x7]: JobName: FreeStyle1
2023-03-13 17:16:15.305+0000 [id=6901] INFO h.TcpSlaveAgentListener$ConnectionHandler#run: Accepted JNLP4-connect connection #18 from /10.28.107.11:55180
2023-03-13 17:16:18.565+0000 [id=6874] INFO c.c.j.p.amazonecs.ECSLauncher#launchECSTask: [fargate-agent-cloud-generic-zfqhk]: Task started, waiting for agent to become online
2023-03-13 17:16:18.565+0000 [id=6874] INFO c.c.j.p.amazonecs.ECSLauncher#waitForAgent: [fargate-agent-cloud-generic-zfqhk]: Agent connected
Node fargate-agent-cloud-generic-211x7 (first to launch) carried out the job.
Full controller node log
2023-03-13 17:13:52.645+0000 [id=41] INFO c.c.j.plugins.amazonecs.ECSCloud#provision: Asked to provision 1 agent(s) for: generic
2023-03-13 17:13:52.645+0000 [id=41] INFO c.c.j.plugins.amazonecs.ECSCloud#provision: Will provision fargate-agent-cloud-generic-211x7, for label: generic
2023-03-13 17:14:02.682+0000 [id=40] INFO hudson.slaves.NodeProvisioner#update: fargate-agent-cloud-generic-211x7 provisioning successfully completed. We have now 2 computer(s)
2023-03-13 17:14:02.721+0000 [id=6855] INFO c.c.j.p.amazonecs.ECSLauncher#runECSTask: [fargate-agent-cloud-generic-211x7]: Starting agent with task definition arn:aws:ecs:eu-west-1:014056181913:task-definition/jenkins-fargate-test-agent-generic:16}
2023-03-13 17:14:03.293+0000 [id=6855] INFO c.c.j.p.amazonecs.ECSLauncher#runECSTask: [fargate-agent-cloud-generic-211x7]: Agent started with task arn : arn:aws:ecs:eu-west-1:014056181913:task/jenkins-fargate-test/07834b1193bf45db8ac88095db2fe31b
2023-03-13 17:14:03.293+0000 [id=6855] INFO c.c.j.p.amazonecs.ECSLauncher#launchECSTask: [fargate-agent-cloud-generic-211x7]: TaskArn: arn:aws:ecs:eu-west-1:014056181913:task/jenkins-fargate-test/07834b1193bf45db8ac88095db2fe31b
2023-03-13 17:14:03.293+0000 [id=6855] INFO c.c.j.p.amazonecs.ECSLauncher#launchECSTask: [fargate-agent-cloud-generic-211x7]: TaskDefinitionArn: arn:aws:ecs:eu-west-1:014056181913:task-definition/jenkins-fargate-test-agent-generic:16
2023-03-13 17:14:03.293+0000 [id=6855] INFO c.c.j.p.amazonecs.ECSLauncher#launchECSTask: [fargate-agent-cloud-generic-211x7]: ClusterArn: arn:aws:ecs:eu-west-1:014056181913:cluster/jenkins-fargate-test
2023-03-13 17:14:03.293+0000 [id=6855] INFO c.c.j.p.amazonecs.ECSLauncher#launchECSTask: [fargate-agent-cloud-generic-211x7]: ContainerInstanceArn: null
2023-03-13 17:14:43.110+0000 [id=6869] INFO h.TcpSlaveAgentListener$ConnectionHandler#run: Connection #15 from /10.28.137.77:48708 failed: null
2023-03-13 17:14:43.191+0000 [id=6870] INFO h.TcpSlaveAgentListener$ConnectionHandler#run: Accepted JNLP4-connect connection #16 from /10.28.137.77:48718
2023-03-13 17:14:52.645+0000 [id=29] INFO c.c.j.plugins.amazonecs.ECSCloud#provision: Asked to provision 1 agent(s) for: generic
2023-03-13 17:14:52.645+0000 [id=29] INFO c.c.j.plugins.amazonecs.ECSCloud#provision: Will provision fargate-agent-cloud-generic-zfqhk, for label: generic
2023-03-13 17:15:02.679+0000 [id=38] INFO hudson.slaves.NodeProvisioner#update: fargate-agent-cloud-generic-zfqhk provisioning successfully completed. We have now 3 computer(s)
2023-03-13 17:15:02.715+0000 [id=6874] INFO c.c.j.p.amazonecs.ECSLauncher#runECSTask: [fargate-agent-cloud-generic-zfqhk]: Starting agent with task definition arn:aws:ecs:eu-west-1:014056181913:task-definition/jenkins-fargate-test-agent-generic:16}
2023-03-13 17:15:03.448+0000 [id=6874] INFO c.c.j.p.amazonecs.ECSLauncher#runECSTask: [fargate-agent-cloud-generic-zfqhk]: Agent started with task arn : arn:aws:ecs:eu-west-1:014056181913:task/jenkins-fargate-test/ea715dd06a4d4b8389773367bb72e817
2023-03-13 17:15:03.448+0000 [id=6874] INFO c.c.j.p.amazonecs.ECSLauncher#launchECSTask: [fargate-agent-cloud-generic-zfqhk]: TaskArn: arn:aws:ecs:eu-west-1:014056181913:task/jenkins-fargate-test/ea715dd06a4d4b8389773367bb72e817
2023-03-13 17:15:03.448+0000 [id=6874] INFO c.c.j.p.amazonecs.ECSLauncher#launchECSTask: [fargate-agent-cloud-generic-zfqhk]: TaskDefinitionArn: arn:aws:ecs:eu-west-1:014056181913:task-definition/jenkins-fargate-test-agent-generic:16
2023-03-13 17:15:03.448+0000 [id=6874] INFO c.c.j.p.amazonecs.ECSLauncher#launchECSTask: [fargate-agent-cloud-generic-zfqhk]: ClusterArn: arn:aws:ecs:eu-west-1:014056181913:cluster/jenkins-fargate-test
2023-03-13 17:15:03.448+0000 [id=6874] INFO c.c.j.p.amazonecs.ECSLauncher#launchECSTask: [fargate-agent-cloud-generic-zfqhk]: ContainerInstanceArn: null
2023-03-13 17:15:18.407+0000 [id=6855] INFO c.c.j.p.amazonecs.ECSLauncher#launchECSTask: [fargate-agent-cloud-generic-211x7]: Task started, waiting for agent to become online
2023-03-13 17:15:18.407+0000 [id=6855] INFO c.c.j.p.amazonecs.ECSLauncher#waitForAgent: [fargate-agent-cloud-generic-211x7]: Agent connected
2023-03-13 17:15:18.459+0000 [id=6856] INFO c.c.j.p.amazonecs.ECSComputer#taskAccepted: [fargate-agent-cloud-generic-211x7]: JobName: FreeStyle1
2023-03-13 17:15:18.459+0000 [id=6856] INFO c.c.j.p.amazonecs.ECSComputer#taskAccepted: [fargate-agent-cloud-generic-211x7]: JobUrl: job/FreeStyle1/
2023-03-13 17:16:15.185+0000 [id=6900] INFO h.TcpSlaveAgentListener$ConnectionHandler#run: Connection #17 from /10.28.107.11:55170 failed: null
2023-03-13 17:16:15.305+0000 [id=6901] INFO h.TcpSlaveAgentListener$ConnectionHandler#run: Accepted JNLP4-connect connection #18 from /10.28.107.11:55180
2023-03-13 17:16:18.565+0000 [id=6874] INFO c.c.j.p.amazonecs.ECSLauncher#launchECSTask: [fargate-agent-cloud-generic-zfqhk]: Task started, waiting for agent to become online
2023-03-13 17:16:18.565+0000 [id=6874] INFO c.c.j.p.amazonecs.ECSLauncher#waitForAgent: [fargate-agent-cloud-generic-zfqhk]: Agent connected
2023-03-13 17:20:56.862+0000 [id=35] INFO h.slaves.CloudRetentionStrategy#check: Disconnecting fargate-agent-cloud-generic-211x7
2023-03-13 17:20:56.862+0000 [id=35] INFO c.c.j.plugins.amazonecs.ECSSlave#_terminate: [fargate-agent-cloud-generic-211x7]: Stopping: TaskArn arn:aws:ecs:eu-west-1:014056181913:task/jenkins-fargate-test/07834b1193bf45db8ac88095db2fe31b, ClusterArn arn:aws:ecs:eu-west-1:014056181913:cluster/jenkins-fargate-test
2023-03-13 17:20:56.866+0000 [id=35] INFO c.c.j.p.amazonecs.ECSService#stopTask: Delete ECS agent task: arn:aws:ecs:eu-west-1:014056181913:task/jenkins-fargate-test/07834b1193bf45db8ac88095db2fe31b
2023-03-13 17:20:56.928+0000 [id=35] INFO h.slaves.CloudRetentionStrategy#check: Disconnecting fargate-agent-cloud-generic-zfqhk
2023-03-13 17:20:56.928+0000 [id=35] INFO c.c.j.plugins.amazonecs.ECSSlave#_terminate: [fargate-agent-cloud-generic-zfqhk]: Stopping: TaskArn arn:aws:ecs:eu-west-1:014056181913:task/jenkins-fargate-test/ea715dd06a4d4b8389773367bb72e817, ClusterArn arn:aws:ecs:eu-west-1:014056181913:cluster/jenkins-fargate-test
2023-03-13 17:20:56.929+0000 [id=6948] INFO j.s.DefaultJnlpSlaveReceiver#channelClosed: Computer.threadPoolForRemoting [#62] for fargate-agent-cloud-generic-211x7 terminated: java.nio.channels.ClosedChannelException
2023-03-13 17:20:56.930+0000 [id=35] INFO c.c.j.p.amazonecs.ECSService#stopTask: Delete ECS agent task: arn:aws:ecs:eu-west-1:014056181913:task/jenkins-fargate-test/ea715dd06a4d4b8389773367bb72e817
2023-03-13 17:20:56.985+0000 [id=6949] INFO j.s.DefaultJnlpSlaveReceiver#channelClosed: Computer.threadPoolForRemoting [#63] for fargate-agent-cloud-generic-zfqhk terminated: java.nio.channels.ClosedChannelException
The text was updated successfully, but these errors were encountered:
Kinda solve this increasing the initialDelay of NodeProvisioner:
{
name = "JAVA_OPTS"
value = "-Dhudson.slaves.NodeProvisioner.initialDelay=60 -Dhudson.slaves.NodeProvisioner.MARGIN=50 -Dhudson.slaves.NodeProvisioner.MARGIN0=0.85"
}
Thanks, that's interesting - I hadn't delved that deep into the internal configuration before.
I ended up writing my own plugin, which of course is much easier to do for a single use case that I define for myself, than for a general tool for a mass audience - I don't underestimate the effort of maintaining a public plugin.
(Aside: even making my own simple plugin for Fargate would have been almost impossible given the state of the plugin development documentation, without using this plugin as inspiration.)
Jenkins and plugins versions report
Environment
What Operating System are you using (both controller, and any agents involved in the problem)?
Linux in Fargate containers, using images based on
jenkins/jenkins:alpine-jdk17
andjenkins/inbound-agent:jdk17
.Reproduction steps
Expected Results
A new Fargate Task for a new Agent node started up, does the Job.
Actual Results
A new Fargate Task is started by the ECS Cloud and runs the Job. Meanwhile a second, identical Fargate Task starts and does nothing as there are no other jobs to run.
After the idle period both are stopped.
Anything else?
I've tried adjusting various timeouts and other values such as poll times for Tasks, to no avail.
It seems like the Task is not considered started quickly enough to avoid Jenkins asking again for a provision?
There are two
c.c.j.plugins.amazonecs.ECSCloud#provision: Asked to provision 1 agent(s) for: generic
entries - here are what look like the important logs extracted from the full log included below:Node
fargate-agent-cloud-generic-211x7
(first to launch) carried out the job.Full controller node log
The text was updated successfully, but these errors were encountered: