-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stats/opentelemetry: Introduce Tracing API #7852
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #7852 +/- ##
==========================================
- Coverage 82.09% 82.05% -0.04%
==========================================
Files 379 384 +5
Lines 38261 38652 +391
==========================================
+ Hits 31409 31716 +307
- Misses 5549 5611 +62
- Partials 1303 1325 +22
|
@@ -119,22 +134,46 @@ func (h *clientStatsHandler) streamInterceptor(ctx context.Context, desc *grpc.S | |||
} | |||
|
|||
startTime := time.Now() | |||
ctx, span := h.createCallTraceSpan(ctx, method) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we should just not call createCallTraceSpan if tracing is disabled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
@aranjans please update this with latest main branch to get rid of all go.mod and go.sum changes |
378ddd3
to
8cb8222
Compare
@purnesh42H Thanks for your review, I have addressed all your comments and this PR is ready for another pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more non-test comments. Will review tests in next pass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more non-test comments. Will review tests in next pass
@aranjans you should link the gRFC and concise your description |
@purnesh42H I have addressed all the comments, and updated the description to link the grfc proposal. |
@purnesh42H I have addressed all your comments, and updated the PR. Kindly review the PR. |
…y_test to opentelemetry
stream.go
Outdated
@@ -213,13 +213,13 @@ func newClientStream(ctx context.Context, desc *StreamDesc, cc *ClientConn, meth | |||
// Provide an opportunity for the first RPC to see the first service config | |||
// provided by the resolver. | |||
if err := cc.waitForResolvedAddrs(ctx); err != nil { | |||
cc.nameResolutionDelayed = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this trying to capture? I don't think this is what you want.
For example, if name resolution takes 30 seconds to occur, and there is no deadline on the RPC, then this will not ever be set.
Also, making this a boolean field on the ClientConn
can't be correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its for capturing the delay in name resolution https://github.com/grpc/proposal/blob/master/A72-open-telemetry-tracing.md#tracing-information
I think we need to specifically look for context timeout error to set this? waitForResolvedAddrs doesn't directly calculate delay; it waits for resolved addresses, and if resolution takes too long, it will return a timeout error?
What is the expected behavior when there is no deadline?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal of the feature is to determine, if a name resolution delay occurred during the RPC's lifespan, how long that took.
So basically you want to add events to the lifecycle of the RPC when these things occur:
Lines 682 to 684 in e957825
select { | |
case <-cc.firstResolveEvent.Done(): | |
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dfawley and @purnesh42H for your thoughts, I will create a separate PR for this as a bonus feature.
@dfawley As per our offline discussion to exclude changes of adding event for name resolution delay, and create a separate PR. I have updated the PR. |
stream.go
Outdated
@@ -215,7 +215,6 @@ func newClientStream(ctx context.Context, desc *StreamDesc, cc *ClientConn, meth | |||
if err := cc.waitForResolvedAddrs(ctx); err != nil { | |||
return nil, err | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please just revert this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
// TextMapPropagator propagates span context through text map carrier. | ||
TextMapPropagator propagation.TextMapPropagator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the user supposed to be able to override this?
We have a default for them, though, right? That should get documented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah GRPCTraceBinPropagator is what user can set if they are in the migration path i.e. using opencensus but its not default. The recommendation is to use W3CContextPropagator provided by otel if they are not using opencensus
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this go into experimental/opentelemetry
, or even experimental/stats/opentelemetry
? Either way there should be comments indicating that it will be moved (and where it will be moved to, and when).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
stats/opentelemetry/opentelemetry.go
Outdated
func init() { | ||
otelinternal.SetPluginOption = func(o *Options, po otelinternal.PluginOption) { | ||
o.MetricsOptions.pluginOption = po | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move this back to where it was to avoid the code churn please?
@@ -66,6 +68,15 @@ func (h *serverStatsHandler) initializeMetrics() { | |||
rm.registerMetrics(metrics, meter) | |||
} | |||
|
|||
func (h *serverStatsHandler) initializeTracing() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this go into a new file? server_tracing.go
?
stats/opentelemetry/opentelemetry.go
Outdated
@@ -171,6 +176,14 @@ func removeLeadingSlash(mn string) string { | |||
return strings.TrimLeft(mn, "/") | |||
} | |||
|
|||
func isMetricsDisabled(mo MetricsOptions) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code now looks like:
if !isMetricsDisabled() {
}
It should instead be:
if isMetricsEnabled() {
}
to avoid a double negative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
otel.SetTextMapPropagator(h.options.TraceOptions.TextMapPropagator) | ||
otel.SetTracerProvider(h.options.TraceOptions.TracerProvider) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, these are global settings. I don't think that's what we want... Only the user should ever be calling these kinds of function, not our library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ping?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dfawley you're right, ideally user should set these variables. I have updated the PR, kindly check.
stats/opentelemetry/opentelemetry.go
Outdated
return mo.MeterProvider == nil | ||
} | ||
|
||
func isTracingDisabled(to experimental.TraceOptions) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should discuss options, but requiring these to be called multiple times from the stats handler feels problematic. Maybe there's a tracing stats handler and a metrics stats handler and we combine them based on which ones are enabled?
At the very least, these should become methods so we don't have to keep reaching into our data structures when calling them to find the options fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dfawley I have updated the PR with the later approach, as it makes sense to me for this to be method of Options struct. Let me know if you think this can be improved.
} | ||
ri := &rpcInfo{ | ||
ai.startTime = startTime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you change how this is set?
Why are you setting a local 3 lines above to time.Now() and then assigning this field to that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated now, thAnks.
ai := &attemptInfo{} | ||
startTime := time.Now() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comments as above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
This pull request adds the OpenTelemetry tracing API to the grpc-go opentelemetry plugin as outlined in proposal A72.
RELEASE NOTES:
stats/opentelemetry/experimental
. This includes the addition ofTraceOptions
in theOptions
struct to allow users to specify theTraceProvider
andTextMapPropagator
.