transport: reduce memory usage for small reads in server handler #7972

lqs · 2024-12-26T08:50:40Z

This pull request optimizes memory usage in the HandleStreams method. Previously, a 16KB buffer (http2MaxFrameLen) was allocated for each request and held until the call completed, which caused unnecessary memory waste when handling long-running requests with many concurrent clients, as memory usage remains high throughout the lifetime of the requests.

Changes

In serverHandlerTransport.HandleStreams, small reads (n < http2MaxFrameLen/4) are now copied into smaller buffers, allowing the larger buffers to be returned to the pool.

codecov · 2024-12-26T08:55:31Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.07%. Comparing base (724f450) to head (3099ffd).
Report is 2 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #7972      +/-   ##
==========================================
- Coverage   82.28%   82.07%   -0.22%     
==========================================
  Files         381      381              
  Lines       38539    38546       +7     
==========================================
- Hits        31712    31636      -76     
- Misses       5535     5591      +56     
- Partials     1292     1319      +27

Files with missing lines	Coverage Δ
internal/transport/handler_server.go	`86.77% <100.00%> (+0.30%)`	⬆️

... and 27 files with indirect coverage changes

arjan-bal · 2024-12-26T09:08:06Z

Hi, @lqs, thanks for sending this PR. Do you have benchmarks showing the effect of this optimization?

We are incurring an extra copy when the bytes read are <= 4KB. We need to decide if the time spent is worth the memory savings.

lqs · 2024-12-27T02:16:13Z

Hi, @arjan-bal, thanks for the feedback.

Most requests in normal cases are only a few dozen bytes, so the extra copy has almost no impact. Even in the worst case, where each frame is exactly 4KB, the copy would add about 200ns on modern hardware.

If this is a concern, we could reduce the threshold, for example, only copying when the data is smaller than 256 bytes.

arjan-bal · 2024-12-27T06:49:40Z

gRPC needs to work well with workloads of all types, large and small messages. If optimizing for specific workloads causes negative impact in others, we need to wight the benefits carefully. @dfawley would like to get your thoughts here.

lqs · 2024-12-31T14:44:32Z

If we want to reduce the impact on other workloads, we could lower the copy threshold further (e.g., 32 bytes) or start with smaller buffers and grow them as needed. Would these approaches work better?

arjan-bal · 2025-01-02T04:45:04Z

Hi @lqs, sorry for the delay, Doug is on vacation and should be back next week. I would like to get his input on this change.

dfawley · 2025-01-07T22:08:13Z

Generally speaking, these buffers should be short-lived:

For unary RPCs (or server-streaming), we'll ~immediately invoke the codec, free the buffer, and call the handler with the output of the codec.
For other streaming RPCs, this buffer will be recycled as soon as the server handler receives the message from the stream.

That said, if you are queuing up lots of small messages on one stream, then this could potentially add up. Can we use ContentLength and get a buffer of the proper size in the first place?

lqs · 2025-01-10T01:45:04Z

Hi @dfawley
Thanks for the update on this. It seems like the real issue is that the buffer isn’t being released at the expected time; instead, it’s freed only after the handler completes. Addressing that would resolve the problem. I’ll close this PR for now.

transport: reduce memory usage for small reads in server handler

3099ffd

arjan-bal added Status: Requires Reporter Clarification Type: Performance Performance improvements (CPU, network, memory, etc) Area: Transport Includes HTTP/2 client/server and HTTP server handler transports and advanced transport features. labels Dec 26, 2024

arjan-bal assigned lqs Dec 26, 2024

arjan-bal removed the Status: Requires Reporter Clarification label Dec 27, 2024

arjan-bal assigned dfawley and unassigned lqs Dec 27, 2024

arjan-bal requested a review from dfawley December 27, 2024 06:50

dfawley assigned lqs and unassigned dfawley Jan 7, 2025

lqs closed this Jan 10, 2025

lqs mentioned this pull request Jan 10, 2025

server: fix buffer release timing in processUnaryRPC #7998

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

transport: reduce memory usage for small reads in server handler #7972

transport: reduce memory usage for small reads in server handler #7972

lqs commented Dec 26, 2024

codecov bot commented Dec 26, 2024 •

edited

Loading

arjan-bal commented Dec 26, 2024 •

edited

Loading

lqs commented Dec 27, 2024

arjan-bal commented Dec 27, 2024

lqs commented Dec 31, 2024

arjan-bal commented Jan 2, 2025

dfawley commented Jan 7, 2025

lqs commented Jan 10, 2025

transport: reduce memory usage for small reads in server handler #7972

transport: reduce memory usage for small reads in server handler #7972

Conversation

lqs commented Dec 26, 2024

Changes

codecov bot commented Dec 26, 2024 • edited Loading

Codecov Report

arjan-bal commented Dec 26, 2024 • edited Loading

lqs commented Dec 27, 2024

arjan-bal commented Dec 27, 2024

lqs commented Dec 31, 2024

arjan-bal commented Jan 2, 2025

dfawley commented Jan 7, 2025

lqs commented Jan 10, 2025

codecov bot commented Dec 26, 2024 •

edited

Loading

arjan-bal commented Dec 26, 2024 •

edited

Loading