-
Notifications
You must be signed in to change notification settings - Fork 676
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SOLR-17587: Prometheus Writer duplicate TYPE information in exposition format #2902
Conversation
Going to bump this PR. It's not a blocker for standard Prometheus server metrics collection but it can potentially block users using other exporters/collectors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct me if I'm wrong, but had you kept the test that you had removed because hossman didn't like it (it was imperfect), a reviewer (like me but you too) would be able to see in this PR a diff against the output format to understand the impact (or lack of impact).
I could say the same thing for dependencies. People add/remove dependencies in build files (Maven/Gradle/Ivy) but ultimately what we want to know is, what JARs changed in the ultimate distribution. |
In theory, yes you would see the difference that is happening here. But since you can't see the output, here is a basically the change to the output: Before:
After:
|
solr/core/src/java/org/apache/solr/handler/admin/MetricsHandler.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/metrics/prometheus/core/SolrCoreCacheMetric.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/metrics/prometheus/core/SolrCoreMetric.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting much better now
@@ -552,6 +549,11 @@ private List<MetricType> parseMetricTypes(SolrParams params) { | |||
return metricTypes; | |||
} | |||
|
|||
private String getCoreNameFromRegistry(String registryName) { | |||
String coreName = registryName.substring(registryName.indexOf('.') + 1); | |||
return coreName.replaceAll("\\.", "_"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for single char find & replace, just use replace
. You'll see by code inspection it's much faster.
solr/core/src/java/org/apache/solr/metrics/prometheus/core/SolrCoreMetric.java
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/metrics/prometheus/core/SolrCoreMetric.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/metrics/prometheus/core/SolrCoreMetric.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/metrics/prometheus/core/SolrCoreMetric.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/metrics/prometheus/core/SolrCoreMetric.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/metrics/prometheus/core/SolrCoreMetric.java
Show resolved
Hide resolved
throw new SolrException( | ||
SolrException.ErrorCode.SERVER_ERROR, | ||
"Error occurred exporting Dropwizard Metric to Prometheus", | ||
e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also wanted to mention this. I was thinking to just remove the log.warn
and actually throw a SolrException. I think at the time I thought that not failing metrics entirely and even just partially getting metrics was ok. But after some thought, adding this would actually fail any metrics from posting but helps exporting tests actually get caught if there is something wrong or even a user finding a bug. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 looks great to me. I forgot about this PR; sorry. Can you suggest CHANGES.txt wording (a comment here is fine).
No problem, thanks for reviewing. How about under bug fix "Metrics: Prometheus response writer fix for non-compliant exposition format containing duplicate TYPE lines" |
Apparently, named groups of Matcher was added in JDK 20. Can you please do a branch_9x backport PR cherry picking this change and make modifications for JDK 11? I'm thinking, declare a static Map of the named groups for each regexp. |
If you haven't started; I will. It's just slightly more work in my IDE as I'm set up for the change. |
No worries, I got it. I'll push a PR to 9x soon. |
PR for 9x backport |
Metrics: Prometheus response writer fix for non-compliant exposition format containing duplicate TYPE lines 9x backport of main: * SOLR-17587: wt=prometheus fix duplicate TYPE information (#2902)
https://issues.apache.org/jira/browse/SOLR-17587
Description
Solr's Prometheus writer duplicates
# TYPE <metric name> <prometheus metric type>
in it's exposition format forcore
registry metrics.This is an illegal format and depending on the technologies prometheus exposition verification for example
Telegraf
, this will fail. For Prometheus server itself, this still passes and collects the metrics just fine for some reason.This is because the Prometheus Writer takes Dropwizard registries and exports them to Prometheus Registries to expose them in Prometheus format. Solr creates Dropwizard registry for every
core
and differentiates the metrics that way even though they have the same metric names.For prometheus, this creates an issue in that metrics should be differentiated in it's attributes and tags. So when the metrics are output with the Prometheus response writer, it duplicates the
TYPE
information because it is a registry for everycore
and doesn't know that the othercore
registries have the same metric name and results in duplicateTYPE
information.Solution
When metrics are going to be exported for prometheus, we merge all the
core
Dropwizard metric registries into a single registry and export that registry into prometheus. Duplicate metric names in a registry is not allowed in prometheus, so we will also append the core name to the Dropwizard metric to differentiate which metric belongs to what core and parse the labels accordingly.This also allowed to clean up and simply some of the
SolrPrometheusCoreFormatter
code.Tests
Updated the test accordingly with the coreName existing in the Dropwizard metric names and it's parsing.
Also added an assert in
testPrometheusStructureOutput
to confirm there is no duplicateTYPE
information in prometheus output.Checklist
Please review the following and check all that apply:
main
branch../gradlew check
.