Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deliver async reports from a goroutine pool #230

Closed
wants to merge 3 commits into from

Conversation

opsengine
Copy link

@opsengine opsengine commented May 9, 2024

Goal

The goal is to avoid creating a new goroutine for each report when using asynchronous mode. The motivation is that an application that creates more reports than the server can handle, may experience high CPU and memory utilization, which can cause further instability in the application.

This problem has been reported before #49

Design

Allocate a pool of worker goroutines. When asynchronous mode is enabled, Notify submits the reports to the pool in a non-blocking fashion. If pool is full, the report will be dropped. This approach maintains the asynchronous behavior while avoiding unlimited proliferation of goroutines.

Changeset

Testing

## 2.3.1 (2024-03-18)

### Bug fixes

* Handle empty pointers to complex structs in metadata.Add
  [bugsnag#221](bugsnag#221)
## 2.4.0 (2024-04-15)

### Enhancements

* Sanitize for metadata should also handler json and []byte
  [bugsnag#226](bugsnag#226)
  [Chris Duncan](https://github.com/veqryn)
@opsengine opsengine force-pushed the async-runner branch 6 times, most recently from 4eb61dc to 92fd13f Compare May 9, 2024 04:25
@opsengine opsengine changed the title Async runner Deliver async reports from a goroutine pool May 9, 2024
@opsengine
Copy link
Author

opsengine commented May 9, 2024

Hi @DariaKunoichi, can I get some initial feedback on this idea?

v2/go.mod Outdated Show resolved Hide resolved
v2/async_runner.go Outdated Show resolved Hide resolved
@opsengine opsengine marked this pull request as ready for review May 9, 2024 17:54
@DariaKunoichi
Copy link
Contributor

Hi @opsengine, thank you for preparing two solutions for this problem.
Unfortunately we can't use solution with generics because we still need to support golang's older versions.
Goroutine pool solution using pond is good but we need to consider how introducing new dependency can affect other users. They may already have a pooling library in use or maybe they want to have full control of the configuration.
The solution we are thinking about is providing a way for users to set their own wrapper for the event delivery. That way users with no performance problems don't have to change anything and can still use old solution - but those concerned can just provide their own async pool's submit function and we will wrap delivery with it. That way user is fully responsible for thread pool configuration and we don't have to add a dependency.

@opsengine
Copy link
Author

Unfortunately we can't use solution with generics because we still need to support golang's older versions.

Understood. I figured that backward compatibility needs to be preserved and discarded that idea.

Goroutine pool solution using pond is good but we need to consider how introducing new dependency can affect other users

Using pond with a large value of NumGoroutines is functionally equivalent to the old behavior of unlimited goroutines. I don't see how the caller application could notice the difference. So to address your concern we could set the default value to something higher.

They may already have a pooling library in use or maybe they want to have full control of the configuration.

A caller that wants to use its own pooling library will enable synchronous mode, so the pond pool will not be used.
We considered writing a wrapper library that manages its own goroutine pool, but figured that if bugsnag offered this feature out of the box, the caller would not need to write any special logic, and just tweak the pool/queue size.

The solution we are thinking about is providing a way for users to set their own wrapper for the event delivery.

This would shift more responsibility to the caller. As a user I would very much prefer that the library did the pooling by default and exposed the relative configs. In other words, it's preferable that the library protects the caller from unlimited goroutine growth as a default behavior, instead of providing ways to address the issue when it has already happened. Dropping some bug reports is much better than compromising the application stability further.

@DariaKunoichi
Copy link
Contributor

Okay, let me explain in more detail.
I agree with the solution of providing a default limiter for the goroutines.
My team members raised concerns that introducing a dependency library for thread pooling may, for some users, collide with other pooling libraries used.
As we don't want to introduce conflicts in the dependencies we may need to write our own goroutine dispatcher with a channel of payloads (or similar solution).
However we don't currently have time scheduled for working on this larger change and we would like to consider it alongside some other improvements to delivery for bugsnag-go.
That's why we looked at an interim solution - to allow users to provide their own executors for delivery - while we schedule a more complete solution, with default limits, in due course.

@opsengine
Copy link
Author

My team members raised concerns that introducing a dependency library for thread pooling may, for some users, collide with other pooling libraries used.

Can you give an example? I can't think of a situation where this would become problematic.

Two alternative ideas to fix this upstream:

  1. Rate limiting. User sets how many reports per second are allowed.
  2. Circuit breaker. If a number of consecutive reports fails to be delivered, disable bug reporting for a few seconds.

If none of these solutions are possible, we will have to either wrap the library or (worse) fork it.

@tomlongridge
Copy link
Contributor

@opsengine – in the immediate term I would suggest forking this repo and using your solution would be the way to go for your usage.

Whilst we see the benefits of your PR, and I appreciate you taking the time to raise it, we need more time than we have available to our team to consider the impact in full to all users of this SDK. There are a few improvements, including this one, to payload delivery that we'd like to make and I'll look to get them included in our Q3 plan at which point your PR will form part of the discussions.

In the meantime, we'll leave the PR open for future discussion and potential use by other users and pick it up again in due course.

@hannah-smartbear
Copy link

Hi @opsengine,

Just wanted to let you know that we have released some changes to async event delivery in v2.5.0 of bugsnag-go. The notifier will now use a single goroutine with a channel to queue events. Please do let us know if you have any questions about this.

@hannah-smartbear hannah-smartbear added the released This feature/bug fix has been released label Aug 28, 2024
@opsengine
Copy link
Author

@hannah-smartbear that's great, thank you! is this feature enabled by default or do I have to edit the configuration?

do I read it correctly that the client will enqueue up to 100 reports and deliver them sequentially to the bugsnag server?

@hannah-smartbear
Copy link

Hi @opsengine,

A single goroutine for async delivery is now enabled by default. This will indeed queue up to 100 events, with any new events added while the queue is full to be dropped.

Please let us know if you have any further questions about this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
released This feature/bug fix has been released
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants