-
Notifications
You must be signed in to change notification settings - Fork 578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix network stack stability issues #10080
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
While analyzing a possible memory leak, we encountered several coroutine exception messages, which unfortunately do not provide any information about what exactly went wrong, as exception diagnostics were previously only logged at the notice level.
Although there is locking involved here, it shoudln't take too long for the thread to actually acquire it, since there aren't that many threads dealing with endpoint clients concurrently. It's just wasting pointless time trying to obtain a CPU slot.
Closing and re-opening that very same log file shouldn't reset the counter, otherwise some log files may exceed the max limit per file as their offset indicator is reset each time they are re-opened.
…ging On incoming connection timeout we log the remote endpoint which isn't available if it was already disconnected - an exception is thrown. Get it as long as we're still connected not to lose it, nor to get an exception.
When a HTTP connection dies prematurely while the response is sent, `http::async_write()` sets the error code to something like broken pipe for example. When calling `async_flush()` afterwards, it sometimes happens that this never returns. This results in a resource leak as the coroutine isn't cleaned up. This commit makes the individual functions throw exceptions instead of silently ignoring the errors, resulting in the function terminating early and also resulting in an error being logged as well.
icinga-probot
bot
added
area/api
REST API
area/distributed
Distributed monitoring (master, satellites, clients)
bug
Something isn't working
core/quality
Improve code, libraries, algorithms, inline docs
ref/IP
labels
Jun 10, 2024
yhabteab
reviewed
Nov 13, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rebase!
When the `Desconnect()` method is called, clients are not disconnected immediately. Instead, a new coroutine is spawned using the same strand as the other coroutines. This coroutine calls `async_shutdown` on the TCP socket, which might be blocking. However, in order not to block indefintely, the `Timeout` class cancels all operations on the socket after `10` seconds. Though, the timeout does not trigger the handler immediately; it creates spawns another coroutine using the same strand as in the `JsonRpcConnection` class. This can cause unexpected delays if e.g. `HandleIncomingMessages` gets resumed before the coroutine from the timeout class. Apart from that, the coroutine for writing messages uses the same condition, making the two symmetrical.
…global thread pool
yhabteab
approved these changes
Nov 14, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area/api
REST API
area/distributed
Distributed monitoring (master, satellites, clients)
bug
Something isn't working
cla/signed
core/quality
Improve code, libraries, algorithms, inline docs
ref/IP
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backport of
CpuBoundWork
usage inJsonRpcConnection::Disconnect()
#9992CpuBoundWork
usages inlib/remote
#9994m_LogMessageCount
when rotating #9999CpuBoundWork
usage #10140