-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server: fix buffer release timing in processUnaryRPC #7998
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #7998 +/- ##
==========================================
+ Coverage 81.94% 82.14% +0.19%
==========================================
Files 381 381
Lines 38539 38548 +9
==========================================
+ Hits 31582 31664 +82
+ Misses 5626 5579 -47
+ Partials 1331 1305 -26
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, nice catch!
server.go
Outdated
df := func(v any) error { | ||
defer d.Free() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Verified that the last reference to d
is in the closure, so the change is safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dfawley, the decoding func is called by generated code. The present version of the generated code always calls this in the first couple of lines.
Do you think there is a chance that it may not get called? If so, should we wrap d.Free()
in a OnceFunc
and defer call it outside the closure to be extra cautious?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can't guarantee that the decode function would be called by someone using the generic API to register their own handler, instead of the generated code. They could have their own interceptor-like pattern that checks headers and doesn't call the decoder if that fails.
So yes, I think for correctness reasons we must defer
a OnceFunc
-wrapped d.Free()
. It's not clear this would be a win for all cases. @lqs can you run our benchmarks before/after that change with 1-byte request/response sizes so that we can be sure it doesn't cause any surprises?
grpc-go/benchmark/benchmain/main.go
Lines 24 to 39 in 2d4daf3
go run benchmark/benchmain/main.go -benchtime=10s -workloads=all \ | |
-compression=gzip -maxConcurrentCalls=1 -trace=off \ | |
-reqSizeBytes=1,1048576 -respSizeBytes=1,1048576 -networkMode=Local \ | |
-cpuProfile=cpuProf -memProfile=memProf -memProfileRate=10000 -resultFile=result | |
As a suggestion, when creating a branch, you can run this benchmark and save the result | |
file "-resultFile=basePerf", and later when you at the middle of the work or finish the | |
work, you can get the benchmark result and compare it with the base anytime. | |
Assume there are two result files names as "basePerf" and "curPerf" created by adding | |
-resultFile=basePerf and -resultFile=curPerf. | |
To format the curPerf, run: | |
go run benchmark/benchresult/main.go curPerf | |
To observe how the performance changes based on a base result, run: | |
go run benchmark/benchresult/main.go basePerf curPerf |
Something like this would run just the set of conditions that should matter here:
go run benchmark/benchmain/main.go -benchtime=60s -workloads=unary \
-compression=off -maxConcurrentCalls=100 -trace=off \
-reqSizeBytes=1 -respSizeBytes=1 -networkMode=Local -resultFile=[BEFORE/AFTER]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are the benchmark results, taken on a Ryzen 9 7945HX CPU. Each test group was run multiple times, and the fastest result was selected.
before
df := func(v any) error {
defer d.Free()
$ ~/sdk/go1.23.4/bin/go run benchmark/benchmain/main.go -benchtime=60s -workloads=unary -compression=off -maxConcurrentCalls=100 -trace=off -reqSizeBytes=1 -respSizeBytes=1 -networkMode=Local -resultFile=before
go1.23.4/grpc1.71.0-dev
unary-networkMode_Local-bufConn_false-keepalive_false-benchTime_1m0s-trace_false-latency_0s-kbps_0-MTU_0-maxConcurrentCalls_100-reqSize_1B-respSize_1B-compressor_off-channelz_false-preloader_false-clientReadBufferSize_-1-clientWriteBufferSize_-1-serverReadBufferSize_-1-serverWriteBufferSize_-1-sleepBetweenRPCs_0s-connections_1-recvBufferPool_nil-sharedWriteBuffer_false:
50_Latency: 530.3120µs 90_Latency: 747.7040µs 99_Latency: 1405.5230µs Avg_Latency: 581.5540µs Bytes/op: 10153.135931747744 Allocs/op: 178.97030961211047
Histogram (unit: µs)
Count: 10295689 Min: 110.7 Max: 41317.2 Avg: 581.55
------------------------------------------------------------
[ 110.655000, 110.656000) 1 0.0% 0.0%
[ 110.656000, 110.662016) 0 0.0% 0.0%
[ 110.662016, 110.704228) 0 0.0% 0.0%
[ 110.704228, 111.000400) 0 0.0% 0.0%
[ 111.000400, 113.078425) 0 0.0% 0.0%
[ 113.078425, 127.658446) 0 0.0% 0.0%
[ 127.658446, 229.956072) 164 0.0% 0.0%
[ 229.956072, 947.705647) 9740824 94.6% 94.6% #########
[ 947.705647, 5983.643190) 554125 5.4% 100.0% #
[ 5983.643190, 41317.230000) 574 0.0% 100.0%
[ 41317.230000, 289227.841760) 1 0.0% 100.0%
Number of requests: 10295689 Request throughput: 1.3727585333333334e+06 bit/s
Number of responses: 10295689 Response throughput: 1.3727585333333334e+06 bit/s
after
dataFree := grpcsync.OnceFunc(d.Free)
defer dataFree()
df := func(v any) error {
defer dataFree()
$ ~/sdk/go1.23.4/bin/go run benchmark/benchmain/main.go -benchtime=60s -workloads=unary -compression=off -maxConcurrentCalls=100 -trace=off -reqSizeBytes=1 -respSizeBytes=1 -networkMode=Local -resultFile=after
go1.23.4/grpc1.71.0-dev
unary-networkMode_Local-bufConn_false-keepalive_false-benchTime_1m0s-trace_false-latency_0s-kbps_0-MTU_0-maxConcurrentCalls_100-reqSize_1B-respSize_1B-compressor_off-channelz_false-preloader_false-clientReadBufferSize_-1-clientWriteBufferSize_-1-serverReadBufferSize_-1-serverWriteBufferSize_-1-sleepBetweenRPCs_0s-connections_1-recvBufferPool_nil-sharedWriteBuffer_false:
50_Latency: 533.2280µs 90_Latency: 744.5700µs 99_Latency: 1415.7970µs Avg_Latency: 584.1710µs Bytes/op: 10225.500515690499 Allocs/op: 181.97781262126145
Histogram (unit: µs)
Count: 10246456 Min: 64.9 Max: 39638.3 Avg: 584.17
------------------------------------------------------------
[ 64.876000, 64.877000) 1 0.0% 0.0%
[ 64.877000, 64.882985) 0 0.0% 0.0%
[ 64.882985, 64.924788) 0 0.0% 0.0%
[ 64.924788, 65.216775) 0 0.0% 0.0%
[ 65.216775, 67.256256) 0 0.0% 0.0%
[ 67.256256, 81.501688) 3 0.0% 0.0%
[ 81.501688, 181.003628) 142 0.0% 0.0%
[ 181.003628, 876.007898) 9584488 93.5% 93.5% #########
[ 876.007898, 5730.495526) 661038 6.5% 100.0% #
[ 5730.495526, 39638.273000) 783 0.0% 100.0%
[ 39638.273000, 276478.380825) 1 0.0% 100.0%
Number of requests: 10246456 Request throughput: 1.3661941333333333e+06 bit/s
Number of responses: 10246456 Response throughput: 1.3661941333333333e+06 bit/s
comparison
$ ~/sdk/go1.23.4/bin/go run benchmark/benchresult/main.go before after
unary-networkMode_Local-bufConn_false-keepalive_false-benchTime_1m0s-trace_false-latency_0s-kbps_0-MTU_0-maxConcurrentCalls_100-reqSize_1B-respSize_1B-compressor_off-channelz_false-preloader_false-clientReadBufferSize_-1-clientWriteBufferSize_-1-serverReadBufferSize_-1-serverWriteBufferSize_-1-sleepBetweenRPCs_0s-connections_1-recvBufferPool_nil-sharedWriteBuffer_false
Title Before After Percentage
TotalOps 10295689 10246456 -0.48%
SendOps 0 0 NaN%
RecvOps 0 0 NaN%
Bytes/op 10153.14 10225.50 0.71%
Allocs/op 178.97 181.98 1.68%
ReqT/op 1372758.53 1366194.13 -0.48%
RespT/op 1372758.53 1366194.13 -0.48%
50th-Lat 530.312µs 533.228µs 0.55%
90th-Lat 747.704µs 744.57µs -0.42%
99th-Lat 1.405523ms 1.415797ms 0.73%
Avg-Lat 581.554µs 584.171µs 0.45%
GoVersion go1.23.4 go1.23.4
GrpcVersion 1.71.0-dev 1.71.0-dev
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please update this PR with the OnceFunc
changes (which I believe we need for "correctness"*)? The results you shared aren't bad but also aren't great, but I'd like to experiment a bit.
* - Technically there is no correctness problem per-se, just re-use of memory that is lost. In theory we could do the change without the OnceFunc
, but then the memory wouldn't get reused if the buffer wasn't decoded, which feels worse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated with the OnceFunc
in the latest commit.
Previously, the buffer
d
was held for the entire duration of the handler execution inprocessUnaryRPC
, leading to unnecessary memory retention. This PR adjusts the code to release the buffer immediately after the decoding functiondf
completes, ensuring memory is freed as soon as it is no longer needed.As a result, the workaround in #7972 to adjust the buffer size is no longer required, as this fix resolves the underlying issue.
RELEASE NOTES: