-
Notifications
You must be signed in to change notification settings - Fork 241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add basic periodic exporting metric_reader #1603
Conversation
👋 This pull request has been marked as stale because it has been open with no activity. You can: comment on the issue or remove the stale label to hold stale off for a while, add the |
|
||
def close | ||
@continue = false # force termination in next iteration | ||
@thread.join(5) # wait 5 seconds for collecting and exporting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why 5 seconds? Should it be configurable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I take reference on what how Java handle the thread waiting (here). I don't see spec mention about the thread waiting time but it seems like it should be configurable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the source, Xuan. Does it make sense for the close
method to use the @interval_millis
value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I like using @interval_millis
in join as well. Updated.
…o periodic-reader
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for putting this together, Xuan! I'm so glad you're making progress on the Metrics work.
metrics_sdk/lib/opentelemetry/sdk/metrics/export/periodic_metric_reader.rb
Outdated
Show resolved
Hide resolved
metrics_sdk/lib/opentelemetry/sdk/metrics/export/periodic_metric_reader.rb
Outdated
Show resolved
Hide resolved
metrics_sdk/lib/opentelemetry/sdk/metrics/export/periodic_metric_reader.rb
Outdated
Show resolved
Hide resolved
|
||
def close | ||
@continue = false # force termination in next iteration | ||
@thread.join(5) # wait 5 seconds for collecting and exporting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the source, Xuan. Does it make sense for the close
method to use the @interval_millis
value?
metrics_sdk/lib/opentelemetry/sdk/metrics/export/periodic_metric_reader.rb
Show resolved
Hide resolved
metrics_sdk/lib/opentelemetry/sdk/metrics/export/periodic_metric_reader.rb
Outdated
Show resolved
Hide resolved
OpenTelemetry.handle_error(exception: e, message: 'Fail to close PeriodicMetricReader.') | ||
end | ||
|
||
# TODO: determine correctness: directly kill the reader without waiting for next metrics collection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't find any clues in the spec. Looking at our SDK, the BatchSpanProcessor also has some periodic functionality and calls force_flush during shutdown, skipping the periodic wait.
Maybe that's the right call here too?
opentelemetry-ruby/sdk/lib/opentelemetry/sdk/trace/export/batch_span_processor.rb
Lines 133 to 152 in 540b77e
# Shuts the consumer thread down and flushes the current accumulated buffer | |
# will block until the thread is finished. | |
# | |
# @param [optional Numeric] timeout An optional timeout in seconds. | |
# @return [Integer] SUCCESS if no error occurred, FAILURE if a | |
# non-specific failure occurred, TIMEOUT if a timeout occurred. | |
def shutdown(timeout: nil) | |
start_time = OpenTelemetry::Common::Utilities.timeout_timestamp | |
thread = lock do | |
@keep_running = false | |
@condition.signal | |
@thread | |
end | |
thread&.join(timeout) | |
force_flush(timeout: OpenTelemetry::Common::Utilities.maybe_timeout(timeout, start_time)) | |
dropped_spans = lock { spans.shift(spans.length) } | |
report_dropped_spans(dropped_spans, reason: 'terminating') if dropped_spans.any? | |
@exporter.shutdown(timeout: OpenTelemetry::Common::Utilities.maybe_timeout(timeout, start_time)) | |
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Updated.
…ic_reader.rb Co-authored-by: Kayla Reopelle (she/her) <[email protected]>
…ic_reader.rb Co-authored-by: Kayla Reopelle (she/her) <[email protected]>
…ic_reader.rb Co-authored-by: Kayla Reopelle (she/her) <[email protected]>
Co-authored-by: Kayla Reopelle (she/her) <[email protected]>
…ry-ruby into periodic-reader
exporter: nil) | ||
super() | ||
|
||
@interval_millis = interval_millis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does _millis
mean? I thought "milliseconds" but after experimenting with this, it seems to be seconds, not milliseconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, the name is indeed confusing. It should be seconds. I have updated the name to export_interval and export_timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thanks! :)
What is needed for this PR to get merged? |
module Export | ||
# PeriodicMetricReader provides a minimal example implementation. | ||
class PeriodicMetricReader < MetricReader | ||
def initialize(export_interval: ENV.fetch('OTEL_METRIC_EXPORT_INTERVAL', 60), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the spec, the OTEL_METRIC_EXPORT_INTERVAL
and OTEL_METRIC_EXPORT_TIMEOUT
env variables should be in milliseconds. See https://opentelemetry.io/docs/specs/otel/configuration/sdk-environment-variables/#periodic-exporting-metricreader.
And according to https://opentelemetry.io/docs/specs/otel/metrics/sdk/#periodic-exporting-metricreader, the parameters should be named export_interval_millis
and export_timeout_millis
.
Furthermore, I think we should use Float
for intervals, timeouts etc. same as in BatchSpanProcessor
and other processors.
export_interval_millis: Float(ENV.fetch('OTEL_METRIC_EXPORT_INTERVAL', 60_000)),
export_timeout_millis: Float(ENV.fetch('OTEL_METRIC_EXPORT_TIMEOUT', 30_000)),
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review and reference on batch span processor. Updated.
while @continue | ||
sleep(@export_interval) | ||
begin | ||
Timeout.timeout(@export_timeout) { @exporter.export(collect) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also pass timeout: @export_timeout
to the @exporter
?
Same as here
opentelemetry-ruby/sdk/lib/opentelemetry/sdk/trace/export/batch_span_processor.rb
Line 188 in 5172e84
result_code = @export_mutex.synchronize { @exporter.export(batch, timeout: timeout) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the Timeout.timeout(@export_timeout) { ... }
will serve the purpose of timeout exporter if the exporter doesn't behave correctly (e.g. hanging).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I was just saying maybe we should pass the timeout also to the exporter to give it a chance to behave properly. We still keep the surrounding Timeout.timeout(@export_timeout) { ... }
block
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion. I have updated the part that pass timeout to exporter
begin | ||
Timeout.timeout(@export_timeout) { @exporter.export(collect) } | ||
rescue Timeout::Error => e | ||
OpenTelemetry.handle_error(exception: e, message: 'PeriodicMetricReader timeout.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is OpenTelemetry.handle_error(exception: e, message: 'PeriodicMetricReader timeout.')
evaluating to FAILURE
? if not then we should return FAILURE
explicitly here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think metric_reader shouldn't return anything (similar to metric_reader); only exporter will return either fail or success from export
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, you are absolutely right.
Description
I'd like to contribute on Periodic exporting MetricReader. I think the periodic metric_reader is useful for asynchronous instrument that collect metrics (e.g. cpu, memory) within fixed duration.
Closes #1627