Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optionally add OM unit #1392

Closed
wants to merge 28 commits into from
Closed

Optionally add OM unit #1392

wants to merge 28 commits into from

Conversation

vesari
Copy link
Contributor

@vesari vesari commented Nov 26, 2023

This PR adds support for unit for Open Metrics. It is ready for review, but not ready to merge (see my long comment below). Fixes #684.

Signed-off-by: Arianna Vespri <[email protected]>
Signed-off-by: Arianna Vespri <[email protected]>
Signed-off-by: Arianna Vespri <[email protected]>
Signed-off-by: Arianna Vespri <[email protected]>
Signed-off-by: Arianna Vespri <[email protected]>
Signed-off-by: Arianna Vespri <[email protected]>
@ArthurSens
Copy link
Member

Hey @vesari , just passing by to ask if there is anything we could do to unblock you 👋

@vesari
Copy link
Contributor Author

vesari commented Mar 15, 2024

Hey @vesari , just passing by to ask if there is anything we could do to unblock you 👋

@ArthurSens hello! I think I'm just using a Go version which is too new, I plan on fixing that later today. Thanks a lot for checking on me, much appreciated! :)

@vesari
Copy link
Contributor Author

vesari commented Mar 15, 2024

@ArthurSens hello again! Here are a few considerations to give you some insight about what I’ve done here and why.
I had to upgrade Go in the GitHub workflows so that the version is at least 1.21, as this is needed by common 0.50.0 which, as you know, contains the necessary implementation for unit. I also changed the Makefile and the go.mod and main.go files in the tutorial folder accordingly. I have not touched the .bingo file but I guess the version should be upgraded there too.
However, Tests(1.20) is still expected to run among the checks, and, as you can see, just hangs, but I suppose that’s because of the repo settings, so I’ll need your help there.

The lint was failing on a supposedly unused test function unrelated to my changes (withDebugMetrics), so I removed it. Let me know if I should instead just have the lint ignore it.

I made the changes needed where an OM format can be passed to expfmt.NewEncoder, and also all the others stemming from the field Unit being added to the type prometheus.Desc. In some cases, I chose to populate this field with the unit which was already included in the metric family name, but I could have also just put an empty string, and I’m not sure what the preferred behaviour would be or if it doesn’t make any difference. I’m talking especially about the various collectors files, like prometheus/collector/dbstats_collector.go , prometheus/go_collector.go or prometheus/go_collector_latest.go etc. I’d be grateful if you could advise me on this.

Also, so far I haven’t added any new tests to promhttp/http_test.go, considering not even the EnableOpenMetrics option has been tested there. Let me know if I should.

Lastly, I happened to push a lot of commits, let me know if you want me to rebase and squash them.

As per usual, many thanks for your time and help! :)

@ArthurSens
Copy link
Member

Oh sorry for the silence here, somehow I missed your last comment.

I had to upgrade Go in the GitHub workflows so that the version is at least 1.21, as this is needed by common 0.50.0 which, as you know, contains the necessary implementation for unit.

Yeah, that's a problem actually. We promise support for the 3 latest go versions, so we can't bump our go version to 1.21 until mid-August when go 1.23 is released. I've started conversations about how we can improve this situation with common.

The lint was failing on a supposedly unused test function unrelated to my changes (withDebugMetrics), so I removed it. Let me know if I should instead just have the lint ignore it.

This is another problem that I've been delaying for way too long 😬 From my understanding it doesn't block PRs though. But anyway, I'll try to get back to this soon, no need to remove it now :)


Regarding the changes that are related to OpenMetrics units, I haven't taken a deep look yet 😬. I'm mostly concerned with enabling the PR first and will review the changes once we solve all the blockers

@vesari
Copy link
Contributor Author

vesari commented Mar 29, 2024

Thanks a lot for the detailed update! Much appreciated

@ArthurSens
Copy link
Member

Hi @vesari, all blockers have been resolved finally :)

The unused function is now used, and we can update prometheus/common without giving up support of go 1.20. If you rebase your PR it should be fine :)

Comment on lines 386 to 388
// If true, WithUnit adds the unit to the encoder options, ultimately allowing the
// unit into the final output, provided that EnableOpenMetrics is also true.
WithUnit bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #1408, we're changing a bit the structure of the HandlerOpts regarding OpenMetrics functionality. We'll probably want something similar here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ArthurSens I made some changes to my code accordingly, but I think it would be better for your PR to be merged before mine, as yours has already been approved and also because in yours you explain the rationale behind the new HandlerOpts struct and what options to prefer. I can then take it from there and finish mine. What do you think? :)

@vesari
Copy link
Contributor Author

vesari commented Apr 13, 2024

Awesome! Thank you very much! I'm on it.

@vesari vesari changed the title WIP: Add unit Optionally add OM unit May 7, 2024
@vesari vesari marked this pull request as ready for review May 7, 2024 07:44
@vesari
Copy link
Contributor Author

vesari commented May 7, 2024

@ArthurSens , I've marked this as "ready for review", but I'm still assuming your PR should be merged first?

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this!

I don't think this PR is blocked on #1408, but I think I would challenge adding units in this way (or adding units at all).

First of all, we can't change NewDesc, it's breaking change.

Then the explicit unit in v2.NewDesc and all metric type constructors' options. Essentially, are we sure asking users to add unit to all their definitions (given majority of the ecosystem metrics have unit in the metric name already) is a good use of their time for the benefit (and what is the benefit actually?)

BTW @vesari or anyone, feel free to challenge me! I might miss some use cases etc! 🤗

So there are two questions to answer:

  1. What's the benefit of having UNIT entry in metadata for OpenMetrics? Essentially what users would gain by using OpenMetrics with Units metadata entry? Right now, I see near zero value 🤔
    a. They would gain "formal compatibility with OpenMetrics 1.0, which means little (no scraper blocks client_golang OpenMetrics at the moment, so what's the point?) and we break compatibility for reasons (we think it would be damaging for Go Prometheus ecosystem to add e.g. _created metrics by default).
    b. Some third party systems would use it. Can we wait for actual use cases come to us? or maybe you have use case?

Wonder, what's your motivation to add units?

  1. What's the additional friction for defining metrics. How we can reduce it or automate it. Howe can reduce the things they need to touch, not increase in their code?

Just looking on changing client golang metrics, we have dozens of changes like this:

image

Also:

image

Now, do we think ALL of 118,491 repositories will check and be happy with extra line to every metric definition that literally duplicate a string. That's like billion code lines in total if everyone would just add one line to every metric with some unit across all our users, JUST in open source. We have to be careful here.

One idea would be to work on some autodetection mechanism that would try to detect units in metric names IF somebody enabled UNITs AND OpenMetrics format 🤔 Ideally we could have some bigger design/proposal for it. However, first question - remains. Do we need UNIT metadata entry?

To sum up, sorry for those challenging questions. Your code is top quality, thanks for massive work on this. It's just a controversial change a bit, and we have large user base, so we have to be more careful here 🤗 DMed you on Slack too, feel bad a bit I didn't respond earlier.

@@ -28,7 +28,7 @@ jobs:
- name: install Go
uses: actions/setup-go@0c52d547c9bc32b1aa3301fd7a9cb496313a4491 # v5.0.0
with:
go-version: 1.20.x
go-version: 1.21.x
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Next time, this probably should be done in a separate PR 🤗

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why this change was even needed 😬, could you explain why you decided to bump the go version here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as with the go.mod below. Just some change I should have reverted, sorry.

@@ -381,6 +387,14 @@ type HandlerOpts struct {
ProcessStartTime time.Time
}

type OpenMetricsOptions struct {
Enable bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could remove this field and use OpenMetricsOptions *OpenMetricsOptions to ensure it's optional? We cannot remove EnableOpenMetrics anyway so we could still use that, OR when OpenMetricsOptions is allocated assume OM is enabled? 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I was kinda tying the possibility of enabling the optional unit to OM been already negotiated as the output format, but maybe there are better ways to ensure that.

prometheus/promhttp/http.go Outdated Show resolved Hide resolved
@@ -75,8 +77,8 @@ type Desc struct {
//
// For constLabels, the label values are constant. Therefore, they are fully
// specified in the Desc. See the Collector example for a usage pattern.
func NewDesc(fqName, help string, variableLabels []string, constLabels Labels) *Desc {
return V2.NewDesc(fqName, help, UnconstrainedLabels(variableLabels), constLabels)
func NewDesc(fqName, help, unit string, variableLabels []string, constLabels Labels) *Desc {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this is a breaking change. We can't change this function.

Not sure how to solve it. Probably NewDescWithUnit, or changing v2 only would make sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, we can't make this change.

I wonder what would be a good option though, maybe something like below?

func (d *Desc) WithUnit(u string) *Desc {
  d.unit = u
  return d
}

A new function NewDescWithUnit also sounds doable, but honestly, a v2 would be the cleaner option. I'm not sure how viable v2 is though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll revert the breaking changes and explore the alternatives you both suggested.

@bwplotka
Copy link
Member

bwplotka commented May 9, 2024

I realized we already kind of agreed to supporting this... #684 so probably too late to challenge it cc @beorn7 @ArthurSens - do you remember what's the benefit other than "compatibility"?

The friction comment and breaking change stays. I feel we could do better than adding Unit field and asking everybody to duplicate this information 🤔

Perhaps some automation AND clear guidance e.g. in comment to NOT add it if you have correct unit in name would help?

Co-authored-by: Bartlomiej Plotka <[email protected]>
Signed-off-by: Arianna Vespri <[email protected]>
@vesari
Copy link
Contributor Author

vesari commented May 9, 2024

@bwplotka thanks, I'll await @beorn7 and @ArthurSens 's feedback on your questions, as they know the original motivation :)

@ArthurSens
Copy link
Member

ArthurSens commented May 10, 2024

I realized we already kind of agreed to supporting this... #684 so probably too late to challenge it cc @beorn7 @ArthurSens - do you remember what's the benefit other than "compatibility"?

At this point, I believe the main benefit is only compatibility with OpenMetrics 1.0, yes. I totally understand that OpenMetrics 1.0 has many details that hinge adoption in the Prometheus ecosystem, but I think we can't ignore that OM 1.0 exists and is used out there.

Prometheus already parses OM units, and stores it as metadata. So my rationale to accept this was to finally close the gap between go applications that wanted this information in Prometheus.

With that said, I admit I'm not aware of how units are used in Prometheus besides easy identification when a human reads PromQL. I was not part of the team when OpenMetrics became a thing, but I had the impression that units were added to the data model for a reason, and that work was being done (or planned to) to make use of units when making operations between metrics that were exposed with different units, e.g. http_requests_duration_seconds + http_requests_duration_miliseconds. I recognize that I could have done a better job here and questioned my own assumptions, but I can also see how units are important for Prometheus now that we're investing in OTLP compatibility. The metric data model for OTLP uses Unit as a metric identifier, which shows the importance for us to support units in client_golang.

Copy link
Member

@ArthurSens ArthurSens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we decide to support units, let's think of alternative that isn't breaking changes to downstream projects :)

Comment on lines -55 to +56
func NewInfoVec(name, help string, labelNames []string) *InfoVec {
desc := prometheus.NewDesc(name, help, labelNames, nil)
func NewInfoVec(name, help, unit string, labelNames []string) *InfoVec {
desc := prometheus.NewDesc(name, help, unit, labelNames, nil)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar problem here, we can't change Public function's signatures

go 1.20
go 1.21
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we want to keep 1.20 until the next go release

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. Yes, absolutely. Just a leftover from when I reverted the changes I had made before common was fixed in this sense.

@vesari
Copy link
Contributor Author

vesari commented May 11, 2024

If we decide to support units, let's think of alternative that isn't breaking changes to downstream projects :)

Of course! I'll await your decision whether to support them and, if so, I'll be happy to continue working on it the way you suggested :)

@beorn7
Copy link
Member

beorn7 commented May 14, 2024

I realized we already kind of agreed to supporting this... #684 so probably too late to challenge it cc @beorn7 @ArthurSens - do you remember what's the benefit other than "compatibility"?

It's hard for me to give a fair judgement here (because of my difficult history with OM). Having said that, the OM approach to units strikes me as weird.

I understand the historical Prometheus approach of "including the unit in the metric name" as a pragmatic solution to the lack of proper metadata support in Prometheus. It should be noted that it always has been just a convention.

OM added the unit metadata field, presumably because others asked for it (the OTel data model didn't exist back then, but I assume that the presence of a unit field in OTel comes from the same school of thinking). Now you might think, if OM has it as an explicit field, you don't have to include it in the metric name anymore. But OM did the opposite: If there is a unit field, the same string MUST be in the metric name, otherwise it is not valid OM. Personally, I think it should be possible to include the unit in the metric name (for those that like that for all the situation where you do not want to lookup the metadata even if you could), but given that others want to live in a world where metadata is always available at your fingertips, enforcing it in the metric name feels redundant.

Sadly, OTel took its guidance for Prometheus compatibility from OM, so the way the OTel data model is converted to Prometheus is already following the OM requirements, although there was never a Prometheus consensus about it.

Looping back to the question: How should we deal with the situation in client_golang? Given how far unit support has already seeped into various parts of the wider ecosystem, it's tough to say "this is all f*** up, let's just ignore it".
I think we should make it possible to expose the unit field for those that want it, but I would, for now, not enforce that the unit is also in the metric name (therefore accepting that we might generate invalid OM 1.0, but anticipating that this requirement is lifted in OM 2.0). It could be a note in the doc comment, like: "If you want to create valid OM 1.0, include the unit in the metric name, but this is not enforced." (I guess the OM vision was that you specify the name without the unit in instrumentation, then specify the unit separately (but only once), and then the actually exposed metric name includes the metric?)

How "1st class" adding the unit should be, is up to debate. I definitely wouldn't require all users to now add a unit. It should be an option for those that want it.

WRT to NewDesc: I once had an ambitious plan to get rid of (user-visible) Desc, see also #222 for issues with Desc. Not sure how to add unit without making everything even worse. Maybe NewDescWithUnit is OK for now.

@vesari
Copy link
Contributor Author

vesari commented May 16, 2024

Thank you all very much for your considerations and suggestions! In order not to enforce that the unit is also in the metric name when the unit field is populated, the logic in common has to be modified accordingly, by reverting some of the most recent changes that enforce exactly that. Of course, I'll be happy to do so. After that, I can work on the NewDescWithUnit approach for NewDesc. Let me know if I can proceed @ArthurSens @bwplotka @beorn7 :)

@beorn7
Copy link
Member

beorn7 commented May 16, 2024

To be clear: @ArthurSens @bwplotka @kakkoyun should make the call here. I'm just providing my personal context.

@ArthurSens
Copy link
Member

ArthurSens commented May 19, 2024

Hi @vesari!

The team met recently and one of the things we discussed was whether we accept OpenMetrics units or not.
A TL;DR; is: We won't accept this until Prometheus has a clear use case for unit metadata.


Reading the OpenMetrics specification, we can read[1][2]:

A MetricFamily MAY have zero or more Metrics. A MetricFamily MUST have a name, HELP, TYPE, and UNIT metadata. Every Metric within a MetricFamily MUST have a unique LabelSet.
.
.
.
Unit specifies MetricFamily units. If non-empty, it MUST be a suffix of the MetricFamily name separated by an underscore. Be aware that further generation rules might make it an infix in the text format.

We clearly understand that units are required by OpenMetrics 1.0, and we clearly understand that units must be used as metric name suffixes. However, at this point, if we accept this the only benefit would be to say "We're OpenMetrics 1.0 compatible". Being OpenMetrics 1.0 compatible is not a bad thing per se, but Prometheus hasn't used the Unit metadata for anything yet, so it brings Client_golang users little benefit while making our API a bit more complex.

If in the future Prometheus introduces the usage of Unit metadata, e.g. to properly sum metrics with different but compatible units (one in seconds and the other in milliseconds), we'd be happy to resurrect this.

In the name of the team, we apologize for taking this long to make a decision. Your work was quite involved, spreading contributions across multiple repositories and communicating with several stakeholders. We'd be happy to continue working with you if that's also of your interest :)

@ArthurSens
Copy link
Member

Maybe we could start brainstorming how to use the unit metadata in Prometheus? Although this would be a high-complexity project that I would also need a lot of time to understand properly😬

@vesari
Copy link
Contributor Author

vesari commented May 20, 2024

Hi @vesari!

The team met recently and one of the things we discussed was whether we accept OpenMetrics units or not. A TL;DR; is: We won't accept this until Prometheus has a clear use case for unit metadata.

Reading the OpenMetrics specification, we can read[1][2]:

A MetricFamily MAY have zero or more Metrics. A MetricFamily MUST have a name, HELP, TYPE, and UNIT metadata. Every Metric within a MetricFamily MUST have a unique LabelSet.
.
.
.
Unit specifies MetricFamily units. If non-empty, it MUST be a suffix of the MetricFamily name separated by an underscore. Be aware that further generation rules might make it an infix in the text format.

We clearly understand that units are required by OpenMetrics 1.0, and we clearly understand that units must be used as metric name suffixes. However, at this point, if we accept this the only benefit would be to say "We're OpenMetrics 1.0 compatible". Being OpenMetrics 1.0 compatible is not a bad thing per se, but Prometheus hasn't used the Unit metadata for anything yet, so it brings Client_golang users little benefit while making our API a bit more complex.

If in the future Prometheus introduces the usage of Unit metadata, e.g. to properly sum metrics with different but compatible units (one in seconds and the other in milliseconds), we'd be happy to resurrect this.

In the name of the team, we apologize for taking this long to make a decision. Your work was quite involved, spreading contributions across multiple repositories and communicating with several stakeholders. We'd be happy to continue working with you if that's also of your interest :)

@ArthurSens, no worries at all, I totally understand. Of course I’d be delighted to continue working with the team, thanks for suggesting that!

@vesari
Copy link
Contributor Author

vesari commented May 20, 2024

Maybe we could start brainstorming how to use the unit metadata in Prometheus? Although this would be a high-complexity project that I would also need a lot of time to understand properly😬

Either this or anything the team deem useful. Whatever you think needs attention, I’ll be happy to work on :) Feel free to DM me on Slack if necessary. Thank you!

@vesari
Copy link
Contributor Author

vesari commented Jul 31, 2024

As agreed with @bwplotka, we want to add units eventually, but this needs more thought on how to avoid adding significant boilerplate to all users for that; help wanted, closing for now.

@vesari vesari closed this Jul 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OpenMetrics unit support (in v1)
4 participants