Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support a strict flush interval for all metrics #169

Draft
wants to merge 19 commits into
base: master
Choose a base branch
from

Conversation

maciuszek
Copy link

@maciuszek maciuszek commented Dec 19, 2024

Add more definite flush interval control, restricting when any metrics is sent to this interval. This will implicitly introduce batching for timers.

@maciuszek maciuszek marked this pull request as draft December 19, 2024 21:09
@maciuszek maciuszek force-pushed the mattkuzminski/enforce-batching-for-all-metrics branch from 01f9e4d to 384f462 Compare December 20, 2024 00:51
Copy link

@sokada1221 sokada1221 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few non-binding comments - as I may not have full context/understanding:

  • I think it's generally a good idea to implement batching with 2 parameters: batch size and interval. Since interval is already configurable, it'd be nice to have batch size configurable - with default being the magic approxMaxMemBytes/bufSize lol
  • How do you plan to test the change? Should we add some metrics/logs to verify the batching behavior?

counter = sink.record
if !strings.Contains(counter, expected) {
t.Error("wanted counter value of test.___f=i:1|c, got", counter)
expected = "test.__host=i:1|c"
Copy link
Author

@maciuszek maciuszek Dec 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: this test is out of scope of this work but it was previously volatile with the order of reserved_tag vs test.__host not being deterministic

net_sink.go Outdated
@@ -118,8 +118,8 @@ func NewNetSink(opts ...SinkOption) FlushableSink {
bufSize = defaultBufferSizeTCP
}

s.outc = make(chan *bytes.Buffer, approxMaxMemBytes/bufSize)
s.retryc = make(chan *bytes.Buffer, 1) // It should be okay to limit this given we preferentially process from this over outc.
s.outc = make(chan *bytes.Buffer, approxMaxMemBytes/bufSize) // todo: need to understand why/how this number was chosen and probably elevate it
Copy link
Author

@maciuszek maciuszek Dec 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make threading possible forking #169 (review) to here . @sokada1221.

So this doesn't restrict memory allocated, but the amount of slots per metric/string.
If it exceeds it'll block more stats for being written until read, it wouldn't act as a batching mechanism 🤔. buffered channeled are a bit strange, i think in actuality this buffer will always be full, if we would change it to normal channel with no buffer (always block), i don't think we would see an impact.

@maciuszek maciuszek force-pushed the mattkuzminski/enforce-batching-for-all-metrics branch from 972503c to 5a6293c Compare December 24, 2024 19:58
net_sink.go Outdated
}
}
return batch[:0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is an error in send this will cause any following batched metrics to be dropped. Might be better to keep them queued. Additionally, this prevents us from putting any buffers received after the error back into the buffer pool (which is only a problem if we decide to keep the existing logic).

Copy link
Author

@maciuszek maciuszek Dec 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's an error in send, the metrics will be written to the retryc channel so nothing should be dropped i think, despite clearing it from the batch. In the current state, that retry handling will escape batching altogether, will think on this.

Nevermind, I see any failure prevents any/all subsequent sends in the current iteration, good catch, thanks

@maciuszek maciuszek force-pushed the mattkuzminski/enforce-batching-for-all-metrics branch from 76f2891 to c3b9a08 Compare December 30, 2024 15:21
@maciuszek maciuszek force-pushed the mattkuzminski/enforce-batching-for-all-metrics branch from 9e755e1 to b7f0a3d Compare December 31, 2024 03:58
add more comments (will need to cleanup later)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants