-
Notifications
You must be signed in to change notification settings - Fork 736
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add sanity caching and retry controls #1744
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #1744 +/- ##
============================================
+ Coverage 80.16% 80.41% +0.24%
- Complexity 2302 2309 +7
============================================
Files 217 218 +1
Lines 6964 7015 +51
Branches 371 371
============================================
+ Hits 5583 5641 +58
+ Misses 1150 1141 -9
- Partials 231 233 +2 ☔ View full report in Codecov by Sentry. |
*/ | ||
<E extends Throwable> T get(SupplierThrows<T, E> query) throws E { | ||
synchronized (lock) { | ||
if (Instant.now().getEpochSecond() > lastQueriedAtEpochSeconds) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the delay of "1 second" should be configurable as well ?
Co-authored-by: Fabien Thouny <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we not use properties in static fields? They require a JVM restart which can be a challenge on a heavily active system. Any particular reason you don't want to do a property look up where the integers get used?
Defining startup properties isn't generally bad but if you're trying to do tuning for the retry behavior it's a drag to have downtime every tine you tweak it.
e.g. System.setProperty within script console and then setting the system property on startup once you have found an acceptable range of values.
@@ -632,11 +651,15 @@ private static IOException interpretApiError(IOException e, | |||
|
|||
private static void logRetryConnectionError(IOException e, URL url, int retries) throws IOException { | |||
// There are a range of connection errors where we want to wait a moment and just automatically retry | |||
long sleepTime = minRetryInterval; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest setting minRetryInterval
and maxRetryInterval
via Integer.getInteger
property here.
LOGGER.log(INFO, | ||
e.getMessage() + " while connecting to " + url + ". Sleeping " + GitHubClient.retryTimeoutMillis | ||
e.getMessage() + " while connecting to " + url + ". Sleeping " + sleepTime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend creating a trace ID here for debug logging. That way an admin can search debug logs and find related logs for a single retry sequence. Here's an example
- log https://github.com/jenkinsci/scm-filter-jervis-plugin/blob/80e28289ede66fa18553b4e0ca5f518a4bd782bc/src/main/groovy/net/gleske/scmfilter/impl/trait/JervisFilterTrait.groovy#L144
- creating trace ID via sha256sum https://github.com/jenkinsci/scm-filter-jervis-plugin/blob/80e28289ede66fa18553b4e0ca5f518a4bd782bc/src/main/groovy/net/gleske/scmfilter/impl/trait/JervisFilterTrait.groovy#L245
When I enable debug logging for a class in the mentioned class it is so active in parallel that all of the logs come in out of order. Because of that, using the trace-
ID as a prefix to all of the logs enable me to search for a series of logs along with their retries. It enabled me to find the maximum retry count across logs as well which helps an admin with tuning.
For exmaple, I default to retries of 30 in my class but I found in practice with GitHub it could retry up to 28 times. Because that was so close to the max retry limit I increased the retry limit to 60 in my particular setup.
I also set the minimum time between retries to be 1000ms and the maximum to be 3000ms. I've found GitHub requiring me to retry up to 1 minute in these scenarios because of secondary API limits.
The new secondary API limits are very aggressive at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize the logging mechanisms have to change a little bit with my feedback; they're not as straightforward as making the change just in this area of code.
I was thinking property lookup is slow and I don't want to do it per query. But compared to query times it's probably negligible. I see your point about down time. This isn't intended as a long term feature as it stands right now and I want to have the least change to existing behavior possible. I'd be okay with saying that if any of the environment variables are set on startup then they will be checked for every query. So, you have to opt in to the behavior on startup. |
being tunable at all is a plus, really. In the case of pipeline API interactions we're moving to a weird but workable hack where we obtain one of 10 flocks randomly meaning there can be up to 10 clients active with GitHub at a time in pipelines. That's kind of how bad it is, though, we're at that point. We're kind of pegged against GH limits so I do think property look up is negligible. But either way if that's how it is we can work with it. It solves a critical issue on our end with dropped pipeline jobs not being created when they should. |
@samrocketman You said:
Perhaps, but why add the complexity? My thought right now is this is purely a sanity check. One second is sane and simple. |
@bitwiseman This comment was from me : I was not sure if you wanted to adopt an iterative approach or if you wanted to anticipate all the needs ;) But it's perfectly fine like that ! For my information, once it's merged and released, for a usage in Jenkins, a new version of this plugin would be needed, right ? Thanks a lot ! |
Yes, iterative.
Yes. |
@samrocketman |
f9d9621
to
f01ddc8
Compare
@samrocketman |
Away on holiday at the moment so can't easily review from mobile but I'll take a look |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes look good to me. Thanks for implementing and accommodating feedback. Interesting use of thread local string I learn something every review in Jenkins projects.
Released in 1.318. |
I subscribed to github-api plugin repo for releases |
Description
getRateLimit()
andgetMeta()
Mitigate the most severe parts of #1728
@samrocketman @KeepItSimpleStupid
Please take a look.
Before submitting a PR:
@link
JavaDoc entries to the relevant documentation on https://docs.github.com/en/rest .mvn -D enable-ci clean install site
locally. If this command doesn't succeed, your change will not pass CI.main
. You will create your PR from that branch.When creating a PR: