Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CD: Automate the activation of the self hosted runner #1128

Merged
merged 1 commit into from
Mar 21, 2024

Conversation

barshaul
Copy link
Collaborator

@barshaul barshaul commented Mar 16, 2024

In order to automatically activate the runner, we did:

  1. SSM installed on the EC2 host
  2. Created an SSM document to start the runner process on the host
  3. IAM user was created for github actions, keys were stored as secrets in github
  4. The CD jobs will wait for the runner to be activated
  5. The CD job will change ownership permissions for the _work folder to allow checking out the repository (see Permission denied when "Deleting the contents of" actions/checkout#211 (comment))

https://github.com/machulav/ec2-github-runner/ haven't been used since it requires an org:admin access in order to generate a registration key for the runner, which aws org doesn't provide.

Closes #656

@barshaul barshaul force-pushed the self_host_runner branch 13 times, most recently from b2ac693 to 075d006 Compare March 17, 2024 09:20
@barshaul barshaul changed the title CD: Automate the start&stop of the self hosted runner CD: Automate the activation of the self hosted runner Mar 17, 2024
@barshaul barshaul force-pushed the self_host_runner branch 10 times, most recently from ea0697a to cb24cf4 Compare March 18, 2024 10:38
@barshaul barshaul marked this pull request as ready for review March 18, 2024 11:31
@barshaul barshaul requested a review from a team as a code owner March 18, 2024 11:31
Copy link
Collaborator

@avifenesh avifenesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im not sure it is a real problem, but i think its need to be looked at - i noticed while running the self hosted runner that its not kill itself after theres no tasks anymore, and then if try to rerun the runner while the previous task is still alive in the background its throwing errors.
So it might need a different approach - perhaps running just per specific task somehow, adding a kill command after runs finished or checking before re-running if a runner is already alive.

@Yury-Fridlyand Yury-Fridlyand added the CI/CD CI/CD related label Mar 18, 2024
@barshaul
Copy link
Collaborator Author

r

Im not sure it is a real problem, but i think its need to be looked at - i noticed while running the self hosted runner that its not kill itself after theres no tasks anymore, and then if try to rerun the runner while the previous task is still alive in the background its throwing errors. So it might need a different approach - perhaps running just per specific task somehow, adding a kill command after runs finished or checking before re-running if a runner is already alive.

Do you have logs of these errors you were seeing? the runner knows how to handle multiple processes trying to activate it, and the only issue I saw was with EACCESS error while trying to checkout the repo, which this PR solves

@avifenesh
Copy link
Collaborator

r

Im not sure it is a real problem, but i think its need to be looked at - i noticed while running the self hosted runner that its not kill itself after theres no tasks anymore, and then if try to rerun the runner while the previous task is still alive in the background its throwing errors. So it might need a different approach - perhaps running just per specific task somehow, adding a kill command after runs finished or checking before re-running if a runner is already alive.

Do you have logs of these errors you were seeing? the runner knows how to handle multiple processes trying to activate it, and the only issue I saw was with EACCESS error while trying to checkout the repo, which this PR solves

My issue was that for some reason it didn't recognize new tasks, so i ctrl+c thinking it will kill it, tried to rerun and got errors telling me that it already run, but still don't run new tasks. So i killed manually the process and started a new runner, which solve my issue.
Unfortunately didn't save the logs since i didn't saw any reason to.

Please tag me if you response in a separate comment so ill get notified.
@barshaul

@barshaul
Copy link
Collaborator Author

r

Im not sure it is a real problem, but i think its need to be looked at - i noticed while running the self hosted runner that its not kill itself after theres no tasks anymore, and then if try to rerun the runner while the previous task is still alive in the background its throwing errors. So it might need a different approach - perhaps running just per specific task somehow, adding a kill command after runs finished or checking before re-running if a runner is already alive.

Do you have logs of these errors you were seeing? the runner knows how to handle multiple processes trying to activate it, and the only issue I saw was with EACCESS error while trying to checkout the repo, which this PR solves

My issue was that for some reason it didn't recognize new tasks, so i ctrl+c thinking it will kill it, tried to rerun and got errors telling me that it already run, but still don't run new tasks. So i killed manually the process and started a new runner, which solve my issue. Unfortunately didn't save the logs since i didn't saw any reason to.

Please tag me if you response in a separate comment so ill get notified. @barshaul

@avifenesh
I don't want to kill the process at the end of the job, as when we dispatch multi tasks with the same PR (as we do here), some tasks are being queued waiting for the runner on the publish-binaries job. If we'll kill the runner, the queued job that waits on the runner will never go back to the activation job.
I haven't saw this issues in my multiple tests with this automation, so I wonder if you're issue really was that old action was still running / stuck and therefore new actions wasn't picked up until you killed the process.
Anyway, if we indeed find this issue later on we can then debug it and fix it.

- name: Checkout
uses: actions/checkout@v4
with:
submodules: "true"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: you probably don't need the submodules.

- name: Checkout
uses: actions/checkout@v4
with:
submodules: "true"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

.github/workflows/npm-cd.yml Show resolved Hide resolved
@barshaul barshaul merged commit 8f0e259 into main Mar 21, 2024
13 checks passed
@barshaul barshaul deleted the self_host_runner branch March 21, 2024 16:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI/CD CI/CD related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Integrate CD with On-demand self-hosted AWS EC2 runner for GitHub Actions
4 participants