-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Releaser] Implement a retry mechanism to survive random failures caused by github #48
Comments
Oh one question: Is it about releasing to GitHub Release Pages or releasing to PyPI? Just to be specific. |
Sorry about the missing long link. Here it is :) It is just a simple github release, no Pypi involved here. Doing retries at the yaml level doesn't seem to be natively supported, so it would be really appreciated if the releaser action could be more resilient to infra glitches :) |
and I just got a another error (Post "https://uploads.github.com/repos/kikmon/huc/releases/68184973/assets?label=&name=huc-2022-06-01-Darwin.zip": read tcp 172.17.0.2:46068->140.82.113.14:443: read: connection reset by peer |
@kikmon, @epsilon-0, this is an annoying issue that has been bugging us since this Action was created. At first, we used the GitHub API through PyGithub. It failed very frequently. Then, we changed to using the GitHub CLI (459faf8). That reduced the frequency of failures, but they are still common. I believe it's because of stability/reliability of the free infrastructure provided by GitHub. I find that small files rarely fail, but larger ones which need to keep the connection alive are tricky. A few months ago, GitHub added the feature to restart individual jobs in the CI runs. Hence, the strategy I've been following is to have all the "assets" uploaded as artifacts and then have a last job in the workflow which just picks them and uploads them to the release through the releaser. When a failure occurs, that job needs to be restarted only. Nonetheless, I of course want to improve the reliability of the releaser Action. I think that retry won't always work. Precisely because the feature I explained in the previous paragraph, I do manually restart the CI in https://github.com/ghdl/ghdl. Sometimes it works, but rather frequently it is not effective. The infrastucture is not reliable for some minutes/hours and I need to wait until later or the next day to restart. As a result, when implementing a retry strategy, we should consider that retrying multiple times in a few minutes might be worthless. Instead, large wait times should be implemented. That can be complex, because workflows might be running on the 6h limit, so there might not be time to wait until the API is stable again. We can either:
|
Thanks for the explanations. |
Seeing |
I'm open to accept pull requests. Please also see #82. |
I'm getting some random failures when publishing a package, and a retry fixes the issue
This is causing a lot of noise in the pipeline, so is it possible to add some retry policies to the releaser action?
The action call is very simple, pushing 3 small (2.5Mb) zip files
Here's the kind of error I'm getting
Post "https://uploads.github.com/repos/kikmon/huc/releases/67601582/assets?label=&name=huc-2022-05-31-Darwin.zip": http2: client connection force closed via ClientConn.Close
Traceback (most recent call last):
File "/releaser.py", line 187, in
check_call(cmd, env=env)
File "/usr/local/lib/python3.9/subprocess.py", line 373, in check_call
raise CalledProcessError(retcode, cmd)
The text was updated successfully, but these errors were encountered: