-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
duplicate hook numbers sending concurrent webhooks #997
Comments
@Lxx-c @NaiveShiki we've deleted both of your comments as they seem like bot driven activity rather than real people: However, if we misunderstood the intention of your comments, please feel free to contribute back to the discussion. |
To add some additional context, I think the fact that this is a recent development points to this change as a potential culprit. Previously, I do wonder if this is yet another push (pun intended) to creating a job queue model for handling webhook payloads... I know this is what GitHub recommends:
|
@ecrupper thanks for that link and extra context! We received some additional feedback that this behavior may have been occurring for longer than what we thought. From what we've gathered, the repos that had already opted in to the automatic branch deletion feature within GitHub have been experiencing this issue for quite some time ("months"). So the "about a month or so" window we received from this most recent report was the first we had heard of it but other people started chiming in with their own experiences after. Given all of this, we agree what you shared appears to be the most likely culprit because it was introduced in the v0.23 release of the Vela server and the timelines we've gathered from all of these reports matches around the same time we performed our upgrade internally to that version. We briefly discussed this internally and also agree that this further elevates the need to create a different method for processing webhooks and following GitHub's recommendations sounds like the right path forward for that. In the meantime, we are planning to add retry logic that follows the same pattern as the build number issue we've faced in the past. We acknowledge that this is a "bandaid fix" but our thought process is it seems like a lower effort code introduction that should workaround the issue and ideally mitigate it from happening completely. This should enable us sufficient time to create a better pattern for processing webhooks in a queue based model like GitHub recommends. |
A PR has been added to mitigate the frequency of this issue: However, we aren't closing this ticket because it's still possible for it to show up similar to: |
Description
There is a known issue that spamming webhooks can produce an error with duplicate build numbers for the server:
To minimize the impact of this issue, we introduced some retry logic in the webhook workflow:
#213 (comment)
However, recently we started noticing a slightly different flavor of this error specific to
hooks
:We believe this is the same kind of underlying problem where concurrent webhooks are being processed and attempted to be inserted into the database for a repository with the same number. After some exploration, we didn't see any evidence that suggests we have the same retry logic in place for the
hooks
table like we setup for thebuilds
table:https://github.com/go-vela/server/blob/main/api/webhook/post.go#L261-L271
We should add additional logic to account for this problem for the
hooks
table like we did for thebuilds
table.Value
Useful Information
vela --version
?v0.23.4
https://www.flatcar.org/
The reports we've received thus far have been small so we're still attempting to gather additional data and information. These reports seem to be specific to "~10% of the time we'll notice that when we merge a pull request the push does not get processed by Vela". After some exploration, we've identified that those repositories have automatic branch deletion configured:
https://docs.github.com/en/[email protected]/repositories/configuring-branches-and-merges-in-your-repository/configuring-pull-request-merges/managing-the-automatic-deletion-of-branches
On the GitHub side, we can confirm that this behavior causes two
push
events being fired at the same exact time:push
to the targetbranch
for the PRpush
for the deletion event of the sourcebranch
from the PRAlso, these reports are from a more recent time window of "about the last month or so". Unfortunately, we're not able to completely verify exactly when this happened because GitHub doesn't store webhook deliveries that far back.
The text was updated successfully, but these errors were encountered: