-
Notifications
You must be signed in to change notification settings - Fork 284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove tracing dependency from http-lib and add baggage #6065
Remove tracing dependency from http-lib and add baggage #6065
Conversation
c2ea03d
to
78eb426
Compare
Rebased atop master to resolve conflicts. I intend to get to the review comments but we've seen a failure in related XenRT test: it's not part of a sequence that's included in automated testing, so I've submitted it against I intend to squash these commits eventually - the current history is for my own benefit, but hinders bisecting. |
69b0210
to
a2ab71c
Compare
I've now managed to get trace contexts to propagate properly. Example of There are a lot of small tickets that can spawn out of these efforts, as there's numerous things I think need to be addressed in how the tracing works. |
Removing the draft status, but I may still push some additions. |
Ring3 BVT+BST is all green ( |
a2ab71c
to
f844e9e
Compare
@@ -436,18 +442,25 @@ let read_request_exn ~proxy_seen ~read_timeout ~total_timeout ~max_length fd = | |||
already sent back a suitable error code and response to the client. *) | |||
let read_request ?proxy_seen ~read_timeout ~total_timeout ~max_length fd = | |||
try | |||
(* TODO: Restore functionality of tracing this function. We rely on the request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this comment current? seems outdated to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is current, the code that's here now does not measure what we want to measure (if I recall correctly). The old version of the code works in practice but is still very questionable.
I believe the span is meant to track the read_request_exn
invocation but, since the traceparent
(and other fields comprising a trace context) is encoded in the request read by this function, the old code does an awkward reparenting of the span. The call for doing that is retained below, but really it's there as evidence that the current API doesn't quite fit every use case.
The parent seems like an easy thing to omit, but the spans carry the trace context that contains any baggage and, so, if a derived span from the tracing started here were to go somewhere else, that would be lost. You can always "(re)parent" because the endpoints are designed to work that way: Jaeger will stitch together a span forest even if parent-child span pairs are exported by different processes, by design. The way we're forcibly exporting baggage as part of attributes means it must be carried if it is to be exported on the OCaml side.
There are ideas in Apollo about how best to propagate metadata around more effectively. Piggy backing off of spans has led to all kinds of "faux" span inheritance, reparenting, etc. tricks (albeit in very specific places in the codebase).
(* tracing *) | ||
(* in *) | ||
ignore tracing ; | ||
?subtask_of ?query ?body path = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the trace context injected now that this has been removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each usage site endows the returned Http.Request.t
with tracing-related headers using the relevant propagator. I thought it would be simpler to remove any knowledge of tracing from the library.
{trace_id; span_id; trace_context= extra_context} | ||
in | ||
let context = | ||
(* If trace_context is provided to the call, override any inherited trace context. *) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any "inherit trace context" comes from ~parent
. Is the idea that eventually ~parent
will disappear in favour of trace_context
, or is the latter only used between service boundaries (e.g. when making a call to another process)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, I think it's best to restrict it between service boundaries. A lot of the API is designed in a way that passes a parent : Span.t option
, which would be lots of work to replace.
An alternative API design would maybe choose to be more explicit: rather than propagating None
, it would always propagate something (either extracted from a carrier or created at the point where tracing intends to begin) which is easily distinguishable (so, during debugging, you can chase up parent
links and find that a service-related incoming span is an ancestor, perhaps also endowed with provenance information such as __MODULE__
). That kind of design would make service boundaries far more explicit in the code but also avoid the tracing library inventing a parent for you (several call frames deep).
The tracing API maybe ought to expose slightly less. It is easy to see the seams: span_of_span_context
, update_span_with_parent
, etc. That said, the problems seem to concentrate themselves at service boundaries for which I was unfamiliar with the related flow of control within xapi.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the main idea to take tracing out of the core HTTP library is desirable, as it would simplify replacing it with another HTTP library at some point in future. The way of "propagating" tracing information and injecting/extracting it from a carrier such as an HTTP request is an abstraction that is familiar from the world of OTel and opens up new ways of extending our use of OTel.
f844e9e
to
e6c840a
Compare
This is a breaking change to http-lib that removes the special treatment of traceparent from the Http.Request module record. In order to more easily include other tracing-related headers in future we propose that we create an aggregate data structure (named something like "trace context") which can (optionally) contain traceparent, baggage, tracestate, etc. and then we "inject" those into the additional_headers of the request we wish to endow with tracing-related information. On the receiving end, a similar "extract" function will perform the dual operation, extracting tracing context from the request: Generally, for some carrier, we intend to provide two operations: val inject : context -> carrier -> carrier val extract : carrier -> context The "carrier" is whatever transport is being used to propagate tracing across service boundaries. In our case, HTTP requests. Signed-off-by: Colin James <[email protected]>
Temporarily remove parts of the code that worked with requests directly, manipulating (or using) traceparents. Signed-off-by: Colin James <[email protected]>
Introduces a more general trace context record that will encapsulate the metadata of tracing that can be propagated across service boundaries. Signed-off-by: Colin James <[email protected]>
In a new library, tracing_propagator, simple injection and extraction routines are provided for rewriting HTTP requests (Http.Request) to have trace-related information. It must be a new library as a cycle would be introduced if we attempted to make tracing depend on http-lib (as http-lib depends on tracing). Signed-off-by: Colin James <[email protected]>
Restore code that was previously disabled. Signed-off-by: Colin James <[email protected]>
The tracing library is removed as a dependency from http-lib. It is still a dependency of http-svr. This is currently a breaking change. The plan is to: - Factor out - and generalise - the helpers defined in place of removing the functions Http.Request.traceparent_of and Http.Request.with_tracing. The comment describes how this should be done: consolidate the impl into the tracing library itself and then provide inject and extract routines to do it for arbitrary carriers. - Rewrite xmlrpc to accept an arbitrary request rewriter function such as (Http.Request.t -> Http.Request.t) before it dispatches. Then, tracing can be injected in by the user. Signed-off-by: Colin James <[email protected]>
Generalise the pattern used in http_svr.ml into a functor in the tracing library. In particular, so long as you can provide a way to name a carrier (to endow its child trace with), inject tracing context into a carrier, and extract trace context into a carrier, you can use this generic pattern to propagate tracing across arbitrary carriers (where the derived span is exported, but the incoming one is not). To this end, we factor it out of http_svr.ml and redefine the helper module used there in terms of one constructed from the new tracing propagator functor. The tracing_propagator library is used to provide definitions in the input module for defining trace propagation across HTTP headers. Signed-off-by: Colin James <[email protected]>
Use trace propagators to endow the Http.Request.t used for XML-RPC requests with in-service information, in order to propagate it. As before, the parent span is named "xe <cmdname>" and its span-id is what makes its way into the request. If the endpoint receiving the request wishes to, they can derive subsequent in-service tracing, rooted from this, by using with_tracing as define in Tracing's Propagator. Signed-off-by: Colin James <[email protected]>
Signed-off-by: Colin James <[email protected]>
- Wrap tracing_propagator library - Drop point-free style in parse Signed-off-by: Colin James <[email protected]>
If baggage is present in the environment, it will be sent to xapi-cli-server. Signed-off-by: Colin James <[email protected]>
Adds TraceContext to the SpanContext data structure and attempts to ensure its inheritance through each part of the code base. The API exposed by the tracing library is a bit problematic, it ought to be simplified and adapted to various use cases. Signed-off-by: Colin James <[email protected]>
e6c840a
to
c5fe9ba
Compare
Removes dependency on
tracing
fromhttp-lib
and add baggage.The introduction of it as a dependency was to support this style of code:
However, HTTP headers are just a carrier in the context of tracing. The library ought to be as inert as possible; providing enough to endow requests with tracing information, but not calling into the tracing library to export child spans (as is done above).
To this end, the notion of a propagator is introduced that provides an interface for doing the above, but generalised to arbitrary carriers. If you can describe how to inject/extract a "trace context" into/from a carrier, you can conceivably propagate tracing across it.
In future, the idea will be to:
SpanContext.t
record to do this. If this information can reach the exporter, we can choose to decorate spans with tags derived from the trace context (such as baggage) during exportation.Span.t option
derived from an encodedtraceparent
in the carrier, we will push downTraceContext
(which is more general than the encodedSpanContext
carried by the optionalSpan.t
that's currently pushed down). This is so we can more easily handle the case where baggage is provided but a traceparent isn't: in which case, a span is still created if tracing is enabled, but with no parent to inherit contextual metadata from (a previous - abandoned - PR introduced baggage in a way that only worked when a traceparent was provided).Opening this as a draft for now. Would appreciate any commentary. There appears to be duplication of
Helper
but this is largely because XAPI'sdune
defines separate libraries which have slightly different dependencies.This has been manually tested with Jaeger, but more thorough testing should probably be done (comparing deeper span forests before and after these changes).