OnEnding should make spans readonly, except within the processor. #1740

dmathieu · 2024-10-03T18:49:00Z

The OnEnding spec mentions:
https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk.md#onending

The SDK MUST guarantee that the span can no longer be modified by any other thread before invoking OnEnding of the first SpanProcessor

The current OnEnding implementation doesn't prevent the span from being modified in another thread, concurrently with the processor's action.

dmathieu · 2024-10-03T18:49:27Z

cc @pantuza

dmathieu · 2024-10-03T18:51:43Z

Two ways this can be done:

Java does it by storing the ID of the thread that runs OnEnding, and prevents any other thread from doing so.
https://github.com/open-telemetry/opentelemetry-java/pull/6367/files#diff-d716139d47f651d6b6558f3e6f03aa59c98e52723f472b2c6a24bd02ec5fe2ceR338-R342

Go is going to setup a wrapper span that allows bypassing the span being read-only, and will be provided to the processor.

github-actions · 2024-11-03T02:04:35Z

👋 This issue has been marked as stale because it has been open with no activity. You can: comment on the issue or remove the stale label to hold stale off for a while, add the keep label to hold stale off permanently, or do nothing. If you do nothing this issue will be closed eventually by the stale bot.

dmathieu · 2024-11-04T08:09:17Z

Should this be reported elsewhere?

arielvalentin · 2024-11-04T14:33:10Z

@dmathieu this is the appropriate place, though I am not sure this is the bug we will run into when our users start to make changes to the span.

Span mutation methods leverage a mutex instance variable that prevents multiple threads or fibers from mutating the span at the same time.

In Span#finish, all span processors will have to have completed their on_finish blocks before the lock is released and the span may be mutated again:

opentelemetry-ruby/sdk/lib/opentelemetry/sdk/trace/span.rb

Line 268 in 555b062

@mutex.synchronize do

That is where I think the real problem is. I assert that when a span processor attempts to mutate the underlying span, it will run into a thread deadlock situation.

Has anyone implemented one of these processors in the wild yet?

cc: @pantuza @mwear @kaylareopelle #1713

pantuza · 2024-11-06T00:10:31Z

Hi folks, I agree that span.finish() method safely calls the processors on_finishing() method. Thus, prevents other threads from trying to modify the span. Although, if any user calls the processor.on_finishing() method directly inside many threads, indeed it can generate a concurrency issue.

Probably, calling this processor method directly isn't desirable, but we are never fully aware of how users will use the library. Thus, might be necessary to protect this method individually. Does it make sense?

@arielvalentin , I did not understand how the mutex.synchronize would generate the deadlock? Can you share an example, please? The only situation I can foresee this is the one where one on_finishing() implementation of any given processor get stuck and never releases the Mutex for the next thread trying to acquire it. Is it what you were saying?

arielvalentin · 2024-11-06T04:40:08Z

It's likely not a deadlock situation as much as an error. I am concerned that mutexes are not re-entrant.

Assuming that is the case then when a span processor attempts to mutate it, then it will hit an additional synchronize block and that may result in errors.

pantuza · 2024-11-06T14:29:54Z

It's likely not a deadlock situation as much as an error. I am concerned that mutexes are not re-entrant.

Assuming that is the case then when a span processor attempts to mutate it, then it will hit an additional synchronize block and that may result in errors.

You are totally right! If any given processor tries to mutate the span, for example setting attributes, it will try to acquire the same Mutex as the finish() method just did. Therefore, the thread gets blocked, cause the Ruby mutexes are not re-entrants as you mentioned before.

I see two alternatives:

Use a separated Mutex that would be used on span Mutations such as:
a. add_attribute, set_attribute, add_link, add_event
b. Current mutex being created
Use some other logic that do not require the same Mutex on both operations: finish and set_attributes
a. As the one @ended variable we already use.
b. As other languages did: Java cited by @dmathieu above in this conversation.

dmathieu added the bug Something isn't working label Oct 3, 2024

github-actions bot added the stale label Nov 3, 2024

arielvalentin added keep and removed stale labels Nov 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OnEnding should make spans readonly, except within the processor. #1740

OnEnding should make spans readonly, except within the processor. #1740

dmathieu commented Oct 3, 2024

dmathieu commented Oct 3, 2024

dmathieu commented Oct 3, 2024

github-actions bot commented Nov 3, 2024

dmathieu commented Nov 4, 2024

arielvalentin commented Nov 4, 2024

pantuza commented Nov 6, 2024

arielvalentin commented Nov 6, 2024

pantuza commented Nov 6, 2024 •

edited

Loading

OnEnding should make spans readonly, except within the processor. #1740

OnEnding should make spans readonly, except within the processor. #1740

Comments

dmathieu commented Oct 3, 2024

dmathieu commented Oct 3, 2024

dmathieu commented Oct 3, 2024

github-actions bot commented Nov 3, 2024

dmathieu commented Nov 4, 2024

arielvalentin commented Nov 4, 2024

pantuza commented Nov 6, 2024

arielvalentin commented Nov 6, 2024

pantuza commented Nov 6, 2024 • edited Loading

pantuza commented Nov 6, 2024 •

edited

Loading