Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOLR-17587: (9x backport) wt=prometheus fix duplicate TYPE information #3006

Merged
merged 2 commits into from
Jan 9, 2025

Conversation

mlbiscoc
Copy link
Contributor

@mlbiscoc mlbiscoc commented Jan 9, 2025

https://issues.apache.org/jira/browse/SOLR-17587

Description

Solr's Prometheus writer duplicates # TYPE <metric name> <prometheus metric type> in it's exposition format for coreregistry metrics.

This is an illegal format and depending on the technologies prometheus exposition verification for example Telegraf, this will fail. For Prometheus server itself, this still passes and collects the metrics just fine for some reason.

This is because the Prometheus Writer takes Dropwizard registries and exports them to Prometheus Registries to expose them in Prometheus format. Solr creates Dropwizard registry for every core and differentiates the metrics that way even though they have the same metric names.

For prometheus, this creates an issue in that metrics should be differentiated in it's attributes and tags. So when the metrics are output with the Prometheus response writer, it duplicates the TYPE information because it is a registry for every core and doesn't know that the other core registries have the same metric name and results in duplicate TYPE information.

Solution

When metrics are going to be exported for prometheus, we merge all the core Dropwizard metric registries into a single registry and export that registry into prometheus. Duplicate metric names in a registry is not allowed in prometheus, so we will also append the core name to the Dropwizard metric to differentiate which metric belongs to what core and parse the labels accordingly.

This also allowed to clean up and simply some of the SolrPrometheusCoreFormatter code.

Tests

Updated the test accordingly with the coreName existing in the Dropwizard metric names and it's parsing.

Also added an assert in testPrometheusStructureOutput to confirm there is no duplicate TYPE information in prometheus output.

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Solr maintainers access to contribute to my PR branch. (optional but recommended, not available for branches on forks living under an organisation)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.
  • I have added documentation for the Reference Guide

mlbiscoc and others added 2 commits January 9, 2025 09:35
Metrics: Prometheus response writer fix for non-compliant exposition format containing duplicate TYPE lines
Comment on lines +33 to +34
public static List<String> CLOUD_LABEL_KEYS = List.of("core", "collection", "shard", "replica");
public static List<String> STANDALONE_LABEL_KEYS = List.of("core");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a static List of the keys for each regex expression instead of the Map and looped through these to add the labels

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants