Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove tracing dependency from http-lib and add baggage #6065

Merged
merged 12 commits into from
Nov 26, 2024

Conversation

contificate
Copy link
Contributor

@contificate contificate commented Oct 18, 2024

Removes dependency on tracing from http-lib and add baggage.


The introduction of it as a dependency was to support this style of code:

let@ req = Http.Request.with_tracing ~name req in

However, HTTP headers are just a carrier in the context of tracing. The library ought to be as inert as possible; providing enough to endow requests with tracing information, but not calling into the tracing library to export child spans (as is done above).

To this end, the notion of a propagator is introduced that provides an interface for doing the above, but generalised to arbitrary carriers. If you can describe how to inject/extract a "trace context" into/from a carrier, you can conceivably propagate tracing across it.


In future, the idea will be to:

  • This trace context can be inherited by spans, so it can reach the exporter. We can probably use the SpanContext.t record to do this. If this information can reach the exporter, we can choose to decorate spans with tags derived from the trace context (such as baggage) during exportation.
  • At incoming service boundaries, instead of pushing down a Span.t option derived from an encoded traceparent in the carrier, we will push down TraceContext (which is more general than the encoded SpanContext carried by the optional Span.t that's currently pushed down). This is so we can more easily handle the case where baggage is provided but a traceparent isn't: in which case, a span is still created if tracing is enabled, but with no parent to inherit contextual metadata from (a previous - abandoned - PR introduced baggage in a way that only worked when a traceparent was provided).

Opening this as a draft for now. Would appreciate any commentary. There appears to be duplication of Helper but this is largely because XAPI's dune defines separate libraries which have slightly different dependencies.

This has been manually tested with Jaeger, but more thorough testing should probably be done (comparing deeper span forests before and after these changes).

ocaml/libs/tracing/dune Outdated Show resolved Hide resolved
@contificate contificate force-pushed the private/cbarr/otel-baggage branch from c2ea03d to 78eb426 Compare October 19, 2024 21:57
@contificate
Copy link
Contributor Author

contificate commented Oct 19, 2024

Rebased atop master to resolve conflicts.

I intend to get to the review comments but we've seen a failure in related XenRT test: it's not part of a sequence that's included in automated testing, so I've submitted it against rt-next to see how it compares. Update: it fails on rt-next as well (4133113).

I intend to squash these commits eventually - the current history is for my own benefit, but hinders bisecting.

@contificate contificate force-pushed the private/cbarr/otel-baggage branch 2 times, most recently from 69b0210 to a2ab71c Compare October 23, 2024 08:10
@contificate
Copy link
Contributor Author

I've now managed to get trace contexts to propagate properly.

Example of BAGGAGE="baggage=enabled" xe vm-group-create name-label=group placement=normal.

{A5E9CFD5-BBFA-4B84-84F4-851763EE7EE4}

There are a lot of small tickets that can spawn out of these efforts, as there's numerous things I think need to be addressed in how the tracing works.

@contificate contificate marked this pull request as ready for review October 23, 2024 08:36
@contificate
Copy link
Contributor Author

Removing the draft status, but I may still push some additions.

@contificate contificate changed the title Remove tracing dependency from http-lib Remove tracing dependency from http-lib and add baggage Oct 23, 2024
@contificate
Copy link
Contributor Author

Ring3 BVT+BST is all green (206737) except for one subtask that hasn't been scheduled (yet) due to lab availability. We are in the process of attempting to upstream the fix in the failing XenRT test that relates to tracing.

@contificate contificate force-pushed the private/cbarr/otel-baggage branch from a2ab71c to f844e9e Compare November 5, 2024 10:49
@@ -436,18 +442,25 @@ let read_request_exn ~proxy_seen ~read_timeout ~total_timeout ~max_length fd =
already sent back a suitable error code and response to the client. *)
let read_request ?proxy_seen ~read_timeout ~total_timeout ~max_length fd =
try
(* TODO: Restore functionality of tracing this function. We rely on the request
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment current? seems outdated to me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is current, the code that's here now does not measure what we want to measure (if I recall correctly). The old version of the code works in practice but is still very questionable.

I believe the span is meant to track the read_request_exn invocation but, since the traceparent (and other fields comprising a trace context) is encoded in the request read by this function, the old code does an awkward reparenting of the span. The call for doing that is retained below, but really it's there as evidence that the current API doesn't quite fit every use case.

The parent seems like an easy thing to omit, but the spans carry the trace context that contains any baggage and, so, if a derived span from the tracing started here were to go somewhere else, that would be lost. You can always "(re)parent" because the endpoints are designed to work that way: Jaeger will stitch together a span forest even if parent-child span pairs are exported by different processes, by design. The way we're forcibly exporting baggage as part of attributes means it must be carried if it is to be exported on the OCaml side.

There are ideas in Apollo about how best to propagate metadata around more effectively. Piggy backing off of spans has led to all kinds of "faux" span inheritance, reparenting, etc. tricks (albeit in very specific places in the codebase).

(* tracing *)
(* in *)
ignore tracing ;
?subtask_of ?query ?body path =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the trace context injected now that this has been removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each usage site endows the returned Http.Request.t with tracing-related headers using the relevant propagator. I thought it would be simpler to remove any knowledge of tracing from the library.

{trace_id; span_id; trace_context= extra_context}
in
let context =
(* If trace_context is provided to the call, override any inherited trace context. *)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any "inherit trace context" comes from ~parent. Is the idea that eventually ~parent will disappear in favour of trace_context, or is the latter only used between service boundaries (e.g. when making a call to another process)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, I think it's best to restrict it between service boundaries. A lot of the API is designed in a way that passes a parent : Span.t option, which would be lots of work to replace.

An alternative API design would maybe choose to be more explicit: rather than propagating None, it would always propagate something (either extracted from a carrier or created at the point where tracing intends to begin) which is easily distinguishable (so, during debugging, you can chase up parent links and find that a service-related incoming span is an ancestor, perhaps also endowed with provenance information such as __MODULE__). That kind of design would make service boundaries far more explicit in the code but also avoid the tracing library inventing a parent for you (several call frames deep).

The tracing API maybe ought to expose slightly less. It is easy to see the seams: span_of_span_context, update_span_with_parent, etc. That said, the problems seem to concentrate themselves at service boundaries for which I was unfamiliar with the related flow of control within xapi.

Copy link
Member

@robhoes robhoes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the main idea to take tracing out of the core HTTP library is desirable, as it would simplify replacing it with another HTTP library at some point in future. The way of "propagating" tracing information and injecting/extracting it from a carrier such as an HTTP request is an abstraction that is familiar from the world of OTel and opens up new ways of extending our use of OTel.

@contificate contificate force-pushed the private/cbarr/otel-baggage branch from f844e9e to e6c840a Compare November 22, 2024 08:00
This is a breaking change to http-lib that removes the special treatment
of traceparent from the Http.Request module record.

In order to more easily include other tracing-related headers in future
we propose that we create an aggregate data structure (named something
like "trace context") which can (optionally) contain traceparent,
baggage, tracestate, etc. and then we "inject" those into the
additional_headers of the request we wish to endow with tracing-related
information. On the receiving end, a similar "extract" function will
perform the dual operation, extracting tracing context from the request:

Generally, for some carrier, we intend to provide two operations:

val inject : context -> carrier -> carrier

val extract : carrier -> context

The "carrier" is whatever transport is being used to propagate tracing
across service boundaries. In our case, HTTP requests.

Signed-off-by: Colin James <[email protected]>
Temporarily remove parts of the code that worked with requests directly,
manipulating (or using) traceparents.

Signed-off-by: Colin James <[email protected]>
Introduces a more general trace context record that will encapsulate the
metadata of tracing that can be propagated across service boundaries.

Signed-off-by: Colin James <[email protected]>
In a new library, tracing_propagator, simple injection and extraction
routines are provided for rewriting HTTP requests (Http.Request) to have
trace-related information.

It must be a new library as a cycle would be introduced if we attempted
to make tracing depend on http-lib (as http-lib depends on tracing).

Signed-off-by: Colin James <[email protected]>
Restore code that was previously disabled.

Signed-off-by: Colin James <[email protected]>
The tracing library is removed as a dependency from http-lib. It is
still a dependency of http-svr. This is currently a breaking change.

The plan is to:
- Factor out - and generalise - the helpers defined in place of removing
  the functions Http.Request.traceparent_of and
  Http.Request.with_tracing. The comment describes how this should be
  done: consolidate the impl into the tracing library itself and then
  provide inject and extract routines to do it for arbitrary carriers.
- Rewrite xmlrpc to accept an arbitrary request rewriter function such
  as (Http.Request.t -> Http.Request.t) before it
  dispatches. Then, tracing can be injected in by the user.

Signed-off-by: Colin James <[email protected]>
Generalise the pattern used in http_svr.ml into a functor in the tracing
library. In particular, so long as you can provide a way to name a
carrier (to endow its child trace with), inject tracing context into a
carrier, and extract trace context into a carrier, you can use this
generic pattern to propagate tracing across arbitrary carriers (where
the derived span is exported, but the incoming one is not).

To this end, we factor it out of http_svr.ml and redefine the helper
module used there in terms of one constructed from the new tracing
propagator functor.

The tracing_propagator library is used to provide definitions in the
input module for defining trace propagation across HTTP headers.

Signed-off-by: Colin James <[email protected]>
Use trace propagators to endow the Http.Request.t used for XML-RPC
requests with in-service information, in order to propagate it.

As before, the parent span is named "xe <cmdname>" and its span-id is
what makes its way into the request. If the endpoint receiving the
request wishes to, they can derive subsequent in-service tracing, rooted
from this, by using with_tracing as define in Tracing's Propagator.

Signed-off-by: Colin James <[email protected]>
- Wrap tracing_propagator library
- Drop point-free style in parse

Signed-off-by: Colin James <[email protected]>
If baggage is present in the environment, it will be sent to
xapi-cli-server.

Signed-off-by: Colin James <[email protected]>
Adds TraceContext to the SpanContext data structure and attempts to
ensure its inheritance through each part of the code base.

The API exposed by the tracing library is a bit problematic, it ought to
be simplified and adapted to various use cases.

Signed-off-by: Colin James <[email protected]>
@contificate contificate force-pushed the private/cbarr/otel-baggage branch from e6c840a to c5fe9ba Compare November 26, 2024 11:09
@robhoes robhoes added this pull request to the merge queue Nov 26, 2024
Merged via the queue into xapi-project:master with commit 6f64a78 Nov 26, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants