You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When the job ends in error (because the Pulsar runner's fail_job() calls stop_job()
Circumstances where the job is not removed:
When the job finishes normally
When the job hits its (k8s) walltime and is killed by k8s
This last one is a source of job "loss" (stuck non-terminal) because Pulsar will never send a terminal status update. The runner should probably poll (as in galaxyproject/galaxy#9911) for this case.
The quickest and easiest (and IMO correct) solution would be to set the TTL in the template as described in the docs. But it would also be a good idea to call MessageCoexecutionPodJobClient.kill() for all jobs when their terminal message is received.
The text was updated successfully, but these errors were encountered:
Such as:
fail_job()
callsstop_job()
Circumstances where the job is not removed:
This last one is a source of job "loss" (stuck non-terminal) because Pulsar will never send a terminal status update. The runner should probably poll (as in galaxyproject/galaxy#9911) for this case.
The quickest and easiest (and IMO correct) solution would be to set the TTL in the template as described in the docs. But it would also be a good idea to call
MessageCoexecutionPodJobClient.kill()
for all jobs when their terminal message is received.The text was updated successfully, but these errors were encountered: