Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stats/opentelemetry: Introduce Tracing API #7852

Open
wants to merge 25 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
fc09e2c
final rebase with master
aranjans Dec 9, 2024
2747371
update e2e tests
aranjans Dec 9, 2024
96f92a0
fix e2e tests
aranjans Dec 10, 2024
f650e37
remove fmt logs
aranjans Dec 10, 2024
175d4c5
move grpc_trace_bin_propagator to experimental
aranjans Dec 11, 2024
c30c44a
move TraceOptions api to experimental
aranjans Dec 11, 2024
90ffa23
addressed purnesh's comments
aranjans Dec 16, 2024
2083830
don't change opentelemetry/e2e_test.go package name from opentelemetr…
aranjans Dec 16, 2024
e8e9d53
make vet happy
aranjans Dec 16, 2024
57fd38a
update TestServerWithMetricsAndTraceOptions
aranjans Dec 17, 2024
b0aad8a
fixed nits
aranjans Dec 17, 2024
8680ae8
fixed nits
aranjans Dec 17, 2024
98d11b7
fix: small nits
aranjans Dec 18, 2024
d0e1a0a
Add test with metrics and traces disabled.
aranjans Dec 19, 2024
b6503f7
make vet happy
aranjans Dec 19, 2024
b2831f1
pull out logic to find name resolution delay
aranjans Dec 20, 2024
1cb4396
remove experimental notice from grpc_trace_bin_propagator
aranjans Dec 20, 2024
b8fe8db
refactor and addressed comments from doug
aranjans Dec 23, 2024
d4ae3ab
Add copyright notice to client_tracing.go
aranjans Dec 23, 2024
c705b97
let client set the propagator and trace provider
aranjans Dec 24, 2024
5571e3b
fix breaking tests
aranjans Dec 24, 2024
6e06350
move call span creation to client_tracing.go
aranjans Jan 2, 2025
cfb92ae
move grpc_trace_bin_propagator -> stats/opentelemetry
aranjans Jan 13, 2025
72e178b
nits
aranjans Jan 15, 2025
5beafe1
disable tracing if textMapPropagator is not set, even if traceProvide…
aranjans Jan 15, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 37 additions & 0 deletions experimental/opentelemetry/trace_options.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
/*
* Copyright 2024 gRPC authors.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

// This package is EXPERIMENTAL and may be move to stats/opentelemetry package

Check failure on line 17 in experimental/opentelemetry/trace_options.go

View workflow job for this annotation

GitHub Actions / tests (vet, 1.22)

package comment should be of the form "Package opentelemetry..." (ST1000)
// in a later release.
package opentelemetry

import (
"go.opentelemetry.io/otel/propagation"
"go.opentelemetry.io/otel/trace"
)

// TraceOptions are the tracing options for OpenTelemetry instrumentation.
type TraceOptions struct {
// TracerProvider is the OpenTelemetry tracer which is required to
dfawley marked this conversation as resolved.
Show resolved Hide resolved
// record traces/trace spans for instrumentation. If unset, tracing
// will not be recorded.
TracerProvider trace.TracerProvider

// TextMapPropagator propagates span context through text map carrier.
// If unset, context propagation will not occur, which may result in
// loss of trace context across service boundaries.
TextMapPropagator propagation.TextMapPropagator
Comment on lines +34 to +36
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the right behavior? Shouldn't we default to our TestMapPropagator implementation, rather than requiring it to be set / doing something suboptimal if it's not?

It seems like that would be my last choice for behavior. If it's unset, I'd rather either fail to initialize (if TracerProvider is also set) or use our default TextMapPropagator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Copy link
Contributor

@purnesh42H purnesh42H Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dfawley as per the proposal, we shouldn't use any default propagator. The recommendation from OpenTelemetry is to use W3CContextPropagator which deals with text keys and values. The GRPCTraceBinPropagator that we have created is only suppose to be used for backward compatibility if someone is using opencensus plugin and they want to migrate to opentelemetry plugin.

So, TextMapPropagator is a mandatory field which user has to set in dial option or server option. If they don't the code won't be instrumented

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, TextMapPropagator is a mandatory field

I'm fine with that, but then we should error at initialization time if TracerProvider is set and TextMapPropagator is not, shouldn't we? And if neither is set then we can just not do any tracing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, it looks like we can't error from this API anyway, since it just returns a DialOption/ServerOption.

Should we log a warning? If you're setting tracing but you don't have a propagator, that seems not very useful? Or is this actually a reasonable configuration? In our opencensus code, I don't think there's any way to disable the span propagation.

Copy link
Contributor

@purnesh42H purnesh42H Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i think logging warning is reasonable if you have set only one of TracerProvider and TextMapPropagator.

If you're setting tracing but you don't have a propagator, that seems not very useful? Or is this actually a reasonable configuration

Its not reasonable. The code will just panic at some point if both are not set.

}
63 changes: 46 additions & 17 deletions stats/opentelemetry/client_metrics.go
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,10 @@
"sync/atomic"
"time"

otelcodes "go.opentelemetry.io/otel/codes"
"go.opentelemetry.io/otel/trace"
"google.golang.org/grpc"
grpccodes "google.golang.org/grpc/codes"
estats "google.golang.org/grpc/experimental/stats"
istats "google.golang.org/grpc/internal/stats"
"google.golang.org/grpc/metadata"
Expand Down Expand Up @@ -85,8 +88,12 @@
}

startTime := time.Now()
var span trace.Span
purnesh42H marked this conversation as resolved.
Show resolved Hide resolved
if h.options.isTracingEnabled() {
ctx, span = h.createCallTraceSpan(ctx, method)
}
err := invoker(ctx, method, req, reply, cc, opts...)
h.perCallMetrics(ctx, err, startTime, ci)
h.perCallTracesAndMetrics(ctx, err, startTime, ci, span)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@purnesh42H

How much code & data is shared between the tracing and metrics implementations? I don't believe they share almost anything?

Would it make sense to implement a separate set of interceptors for tracing & metrics so that we can avoid checking to see which one is implemented for every event in the life of an RPC? I think the code would end up simpler/cleaner/easier to review that way, too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dfawley they only share the DialOption and ServerOption as per proposal. Are you suggesting to have 4 interceptors internally? Both stream and unary each for metrics and traces? If/When we add logging, we will add 2 more then. But yeah we will have separate path for each and then don't have to check if the other one is implemented as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it would helpful to have separate handlers for the different functionality, since they share nothing in common and can be enabled/disabled independently.

return err
}

Expand Down Expand Up @@ -119,22 +126,37 @@
}

startTime := time.Now()

var span trace.Span
purnesh42H marked this conversation as resolved.
Show resolved Hide resolved
if h.options.isTracingEnabled() {
ctx, span = h.createCallTraceSpan(ctx, method)
}
callback := func(err error) {
h.perCallMetrics(ctx, err, startTime, ci)
h.perCallTracesAndMetrics(ctx, err, startTime, ci, span)
}
opts = append([]grpc.CallOption{grpc.OnFinish(callback)}, opts...)
return streamer(ctx, desc, cc, method, opts...)
}

func (h *clientStatsHandler) perCallMetrics(ctx context.Context, err error, startTime time.Time, ci *callInfo) {
callLatency := float64(time.Since(startTime)) / float64(time.Second) // calculate ASAP
attrs := otelmetric.WithAttributeSet(otelattribute.NewSet(
otelattribute.String("grpc.method", ci.method),
otelattribute.String("grpc.target", ci.target),
otelattribute.String("grpc.status", canonicalString(status.Code(err))),
))
h.clientMetrics.callDuration.Record(ctx, callLatency, attrs)
// perCallTracesAndMetrics records per call trace spans and metrics.
func (h *clientStatsHandler) perCallTracesAndMetrics(ctx context.Context, err error, startTime time.Time, ci *callInfo, ts trace.Span) {
purnesh42H marked this conversation as resolved.
Show resolved Hide resolved
if h.options.isTracingEnabled() {
s := status.Convert(err)
if s.Code() == grpccodes.OK {
ts.SetStatus(otelcodes.Ok, s.Message())
} else {
ts.SetStatus(otelcodes.Error, s.Message())
}

Check warning on line 148 in stats/opentelemetry/client_metrics.go

View check run for this annotation

Codecov / codecov/patch

stats/opentelemetry/client_metrics.go#L147-L148

Added lines #L147 - L148 were not covered by tests
ts.End()
}
if h.options.isMetricsEnabled() {
callLatency := float64(time.Since(startTime)) / float64(time.Second)
attrs := otelmetric.WithAttributeSet(otelattribute.NewSet(
otelattribute.String("grpc.method", ci.method),
otelattribute.String("grpc.target", ci.target),
otelattribute.String("grpc.status", canonicalString(status.Code(err))),
))
h.clientMetrics.callDuration.Record(ctx, callLatency, attrs)
}
}

// TagConn exists to satisfy stats.Handler.
Expand Down Expand Up @@ -163,15 +185,17 @@
}
ctx = istats.SetLabels(ctx, labels)
}
ai := &attemptInfo{ // populates information about RPC start.
ai := &attemptInfo{
startTime: time.Now(),
xdsLabels: labels.TelemetryLabels,
method: info.FullMethodName,
method: removeLeadingSlash(info.FullMethodName),
}
ri := &rpcInfo{
ai: ai,
if h.options.isTracingEnabled() {
ctx, ai = h.traceTagRPC(ctx, info, ai)
}
return setRPCInfo(ctx, ri)
return setRPCInfo(ctx, &rpcInfo{
ai: ai,
})
}

func (h *clientStatsHandler) HandleRPC(ctx context.Context, rs stats.RPCStats) {
Expand All @@ -180,7 +204,12 @@
logger.Error("ctx passed into client side stats handler metrics event handling has no client attempt data present")
return
}
h.processRPCEvent(ctx, rs, ri.ai)
if h.options.isMetricsEnabled() {
h.processRPCEvent(ctx, rs, ri.ai)
}
if h.options.isTracingEnabled() {
populateSpan(rs, ri.ai)
}
}

func (h *clientStatsHandler) processRPCEvent(ctx context.Context, s stats.RPCStats, ai *attemptInfo) {
Expand Down
55 changes: 55 additions & 0 deletions stats/opentelemetry/client_tracing.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
/*
* Copyright 2024 gRPC authors.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package opentelemetry

import (
"context"
"strings"

"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/trace"
"google.golang.org/grpc/stats"
otelinternaltracing "google.golang.org/grpc/stats/opentelemetry/internal/tracing"
)

// traceTagRPC populates provided context with a new span using the
// TextMapPropagator supplied in trace options and internal itracing.carrier.
// It creates a new outgoing carrier which serializes information about this
// span into gRPC Metadata, if TextMapPropagator is provided in the trace
// options. if TextMapPropagator is not provided, it returns the context as is.
func (h *clientStatsHandler) traceTagRPC(ctx context.Context, rti *stats.RPCTagInfo, ai *attemptInfo) (context.Context, *attemptInfo) {
mn := "Attempt." + strings.Replace(ai.method, "/", ".", -1)
tracer := otel.Tracer("grpc-open-telemetry")
ctx, span := tracer.Start(ctx, mn)
carrier := otelinternaltracing.NewOutgoingCarrier(ctx)
otel.GetTextMapPropagator().Inject(ctx, carrier)
ai.traceSpan = span
return carrier.Context(), ai
}

// createCallTraceSpan creates a call span to put in the provided context using
// provided TraceProvider. If TraceProvider is nil, it returns context as is.
func (h *clientStatsHandler) createCallTraceSpan(ctx context.Context, method string) (context.Context, trace.Span) {
if h.options.TraceOptions.TracerProvider == nil {
logger.Error("TraceProvider is not provided in trace options")
return ctx, nil
}

Check warning on line 50 in stats/opentelemetry/client_tracing.go

View check run for this annotation

Codecov / codecov/patch

stats/opentelemetry/client_tracing.go#L48-L50

Added lines #L48 - L50 were not covered by tests
mn := strings.Replace(removeLeadingSlash(method), "/", ".", -1)
tracer := otel.Tracer("grpc-open-telemetry")
ctx, span := tracer.Start(ctx, mn, trace.WithSpanKind(trace.SpanKindClient))
return ctx, span
}
Loading
Loading