Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add basic periodic exporting metric_reader #1603

Merged
merged 20 commits into from
Aug 21, 2024

Conversation

xuan-cao-swi
Copy link
Contributor

@xuan-cao-swi xuan-cao-swi commented Feb 22, 2024

Description

I'd like to contribute on Periodic exporting MetricReader. I think the periodic metric_reader is useful for asynchronous instrument that collect metrics (e.g. cpu, memory) within fixed duration.

Closes #1627

Copy link
Contributor

👋 This pull request has been marked as stale because it has been open with no activity. You can: comment on the issue or remove the stale label to hold stale off for a while, add the keep label to hold stale off permanently, or do nothing. If you do nothing this pull request will be closed eventually by the stale bot

@github-actions github-actions bot added the stale label Mar 24, 2024

def close
@continue = false # force termination in next iteration
@thread.join(5) # wait 5 seconds for collecting and exporting
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 5 seconds? Should it be configurable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I take reference on what how Java handle the thread waiting (here). I don't see spec mention about the thread waiting time but it seems like it should be configurable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the source, Xuan. Does it make sense for the close method to use the @interval_millis value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I like using @interval_millis in join as well. Updated.

@github-actions github-actions bot removed the stale label Mar 26, 2024
Copy link
Contributor

@kaylareopelle kaylareopelle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for putting this together, Xuan! I'm so glad you're making progress on the Metrics work.


def close
@continue = false # force termination in next iteration
@thread.join(5) # wait 5 seconds for collecting and exporting
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the source, Xuan. Does it make sense for the close method to use the @interval_millis value?

OpenTelemetry.handle_error(exception: e, message: 'Fail to close PeriodicMetricReader.')
end

# TODO: determine correctness: directly kill the reader without waiting for next metrics collection
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find any clues in the spec. Looking at our SDK, the BatchSpanProcessor also has some periodic functionality and calls force_flush during shutdown, skipping the periodic wait.

Maybe that's the right call here too?

# Shuts the consumer thread down and flushes the current accumulated buffer
# will block until the thread is finished.
#
# @param [optional Numeric] timeout An optional timeout in seconds.
# @return [Integer] SUCCESS if no error occurred, FAILURE if a
# non-specific failure occurred, TIMEOUT if a timeout occurred.
def shutdown(timeout: nil)
start_time = OpenTelemetry::Common::Utilities.timeout_timestamp
thread = lock do
@keep_running = false
@condition.signal
@thread
end
thread&.join(timeout)
force_flush(timeout: OpenTelemetry::Common::Utilities.maybe_timeout(timeout, start_time))
dropped_spans = lock { spans.shift(spans.length) }
report_dropped_spans(dropped_spans, reason: 'terminating') if dropped_spans.any?
@exporter.shutdown(timeout: OpenTelemetry::Common::Utilities.maybe_timeout(timeout, start_time))
end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Updated.

exporter: nil)
super()

@interval_millis = interval_millis

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does _millis mean? I thought "milliseconds" but after experimenting with this, it seems to be seconds, not milliseconds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, the name is indeed confusing. It should be seconds. I have updated the name to export_interval and export_timeout.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks! :)

@cdan-youdo
Copy link

What is needed for this PR to get merged?

module Export
# PeriodicMetricReader provides a minimal example implementation.
class PeriodicMetricReader < MetricReader
def initialize(export_interval: ENV.fetch('OTEL_METRIC_EXPORT_INTERVAL', 60),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the spec, the OTEL_METRIC_EXPORT_INTERVAL and OTEL_METRIC_EXPORT_TIMEOUT env variables should be in milliseconds. See https://opentelemetry.io/docs/specs/otel/configuration/sdk-environment-variables/#periodic-exporting-metricreader.

And according to https://opentelemetry.io/docs/specs/otel/metrics/sdk/#periodic-exporting-metricreader, the parameters should be named export_interval_millis and export_timeout_millis.

Furthermore, I think we should use Float for intervals, timeouts etc. same as in BatchSpanProcessor and other processors.

export_interval_millis: Float(ENV.fetch('OTEL_METRIC_EXPORT_INTERVAL', 60_000)),
export_timeout_millis: Float(ENV.fetch('OTEL_METRIC_EXPORT_TIMEOUT', 30_000)),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review and reference on batch span processor. Updated.

while @continue
sleep(@export_interval)
begin
Timeout.timeout(@export_timeout) { @exporter.export(collect) }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also pass timeout: @export_timeout to the @exporter?
Same as here

result_code = @export_mutex.synchronize { @exporter.export(batch, timeout: timeout) }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the Timeout.timeout(@export_timeout) { ... } will serve the purpose of timeout exporter if the exporter doesn't behave correctly (e.g. hanging).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I was just saying maybe we should pass the timeout also to the exporter to give it a chance to behave properly. We still keep the surrounding Timeout.timeout(@export_timeout) { ... } block

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. I have updated the part that pass timeout to exporter

begin
Timeout.timeout(@export_timeout) { @exporter.export(collect) }
rescue Timeout::Error => e
OpenTelemetry.handle_error(exception: e, message: 'PeriodicMetricReader timeout.')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is OpenTelemetry.handle_error(exception: e, message: 'PeriodicMetricReader timeout.') evaluating to FAILURE ? if not then we should return FAILURE explicitly here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think metric_reader shouldn't return anything (similar to metric_reader); only exporter will return either fail or success from export

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, you are absolutely right.

@mwear mwear merged commit 62bb150 into open-telemetry:main Aug 21, 2024
60 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Enhance the thread management in periodic metric reader
7 participants