Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transport: reduce memory usage for small reads in server handler #7972

Closed
wants to merge 1 commit into from

Conversation

lqs
Copy link

@lqs lqs commented Dec 26, 2024

This pull request optimizes memory usage in the HandleStreams method. Previously, a 16KB buffer (http2MaxFrameLen) was allocated for each request and held until the call completed, which caused unnecessary memory waste when handling long-running requests with many concurrent clients, as memory usage remains high throughout the lifetime of the requests.

Changes

  • In serverHandlerTransport.HandleStreams, small reads (n < http2MaxFrameLen/4) are now copied into smaller buffers, allowing the larger buffers to be returned to the pool.

Copy link

codecov bot commented Dec 26, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.07%. Comparing base (724f450) to head (3099ffd).
Report is 2 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7972      +/-   ##
==========================================
- Coverage   82.28%   82.07%   -0.22%     
==========================================
  Files         381      381              
  Lines       38539    38546       +7     
==========================================
- Hits        31712    31636      -76     
- Misses       5535     5591      +56     
- Partials     1292     1319      +27     
Files with missing lines Coverage Δ
internal/transport/handler_server.go 86.77% <100.00%> (+0.30%) ⬆️

... and 27 files with indirect coverage changes

@arjan-bal
Copy link
Contributor

arjan-bal commented Dec 26, 2024

Hi, @lqs, thanks for sending this PR. Do you have benchmarks showing the effect of this optimization?

We are incurring an extra copy when the bytes read are <= 4KB. We need to decide if the time spent is worth the memory savings.

@arjan-bal arjan-bal added Status: Requires Reporter Clarification Type: Performance Performance improvements (CPU, network, memory, etc) Area: Transport Includes HTTP/2 client/server and HTTP server handler transports and advanced transport features. labels Dec 26, 2024
@lqs
Copy link
Author

lqs commented Dec 27, 2024

Hi, @arjan-bal, thanks for the feedback.

Most requests in normal cases are only a few dozen bytes, so the extra copy has almost no impact. Even in the worst case, where each frame is exactly 4KB, the copy would add about 200ns on modern hardware.

If this is a concern, we could reduce the threshold, for example, only copying when the data is smaller than 256 bytes.

@arjan-bal
Copy link
Contributor

gRPC needs to work well with workloads of all types, large and small messages. If optimizing for specific workloads causes negative impact in others, we need to wight the benefits carefully. @dfawley would like to get your thoughts here.

@lqs
Copy link
Author

lqs commented Dec 31, 2024

If we want to reduce the impact on other workloads, we could lower the copy threshold further (e.g., 32 bytes) or start with smaller buffers and grow them as needed. Would these approaches work better?

@arjan-bal
Copy link
Contributor

Hi @lqs, sorry for the delay, Doug is on vacation and should be back next week. I would like to get his input on this change.

@dfawley
Copy link
Member

dfawley commented Jan 7, 2025

Generally speaking, these buffers should be short-lived:

  1. For unary RPCs (or server-streaming), we'll ~immediately invoke the codec, free the buffer, and call the handler with the output of the codec.
  2. For other streaming RPCs, this buffer will be recycled as soon as the server handler receives the message from the stream.

That said, if you are queuing up lots of small messages on one stream, then this could potentially add up. Can we use ContentLength and get a buffer of the proper size in the first place?

@dfawley dfawley assigned lqs and unassigned dfawley Jan 7, 2025
@lqs
Copy link
Author

lqs commented Jan 10, 2025

Hi @dfawley
Thanks for the update on this. It seems like the real issue is that the buffer isn’t being released at the expected time; instead, it’s freed only after the handler completes. Addressing that would resolve the problem. I’ll close this PR for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Transport Includes HTTP/2 client/server and HTTP server handler transports and advanced transport features. Type: Performance Performance improvements (CPU, network, memory, etc)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants