Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] HA Tracker support for multiple prometheus replicas in the same batch #6256

Open
eduardscaueru opened this issue Oct 9, 2024 · 6 comments · May be fixed by #6278 or #6279
Open

[Feature Request] HA Tracker support for multiple prometheus replicas in the same batch #6256

eduardscaueru opened this issue Oct 9, 2024 · 6 comments · May be fixed by #6278 or #6279

Comments

@eduardscaueru
Copy link

Is your feature request related to a problem? Please describe.
The HA Tracker mechanism Cortex provides, is based on a Prometheus only remote write, where each batch is from a separate replica. In our case, we have multiple datapoints coming from multiple producers, mixed in a remote written batch. Some datapoints can be sourced from Prometheus HA pair while others might be sourced from other systems.
Distributor HA Tracker implementation will look for the prometheus replica label only in the first datapoint from the batch and assumes that other datapoints from that batch are from the same prometheus replica.
Thus, we have 3 scenarios where the FIRST datapoint from the batch:

  1. Doesn’t have promethues replica or cluster labels - the batch will be pushed
  2. Has prometheus replica label:
    2.1. If it is the same as the elected leader replica selected and stored in kv store the batch will be pushed
    2.2. If its NOT the same as the elected leader replica selected and stored in kv store the batch will not be pushed

Describe the solution you'd like
Maybe apply the same mechanism of the ha tracker after the batch is separated into smaller batches for each (cluster, replica) pairs. Instead of calling the findHALabels method, if HA Tracker is enabled, add a method to separate these batches and then iterate through them and discard only the smaller batch from the replica which is not in the kv store.

Describe alternatives you've considered
It is possible on our services implementation to segregate datapoints and create HA pairs specific batches. We will have dedicated batches for non HA pairs sourced datapoints and for each HA pair.
The following diagram represents the solution from our side where T1 and T2 represent cluster labels, a and b replica labels (as in the official ha tracker docs) and s1 and s2 different samples for each of these series.
cortex_ha_tracker_segregation drawio

Additional context
Add any other context or screenshots about the feature request here.

@friedrichg
Copy link
Member

friedrichg commented Oct 12, 2024

@eduardscaueru This will likely increase distributor CPU needs as the whole batch needs to be read.
Are you willing also to implement such experimental feature?

@eduardscaueru
Copy link
Author

Hello @friedrichg. Thank you for your response.

Yes, I would like to contribute and implement this feature.

@eduardscaueru
Copy link
Author

eduardscaueru commented Oct 20, 2024

Hi @friedrichg

I implemented this feature in 2 ways

  1. PR 6278 - For every timeseries in the request/batch, in the iteration from prepareSeriesKeys method, if it has both cluster and replica labels, it checks the KV store to see if it matches with the elected one. If not, the timeseries is discarded.
  2. PR 6279 - Another for loop through the timeseries from the request is performed, in order to find all HA (cluster, replica) pairs. Then, for each of these unique pairs, the KV store is checked to see if it matches with the elected one, and only the valid pairs are stored in a map (for this request). Then, in the for loop from prepareSeriesKeys method, if the datapoint contains both cluster and replica labels and they are not found in the valid HA pairs map, the timeseries is discarded.

The PR 6279, where the valid HA pairs are stored, probably will increase both the distributor's memory and CPU (as you suggested). On the other hand, the PR 6278, where the KV store is called every time a timeseries has both cluster and replica labels, will probably consume more CPU.

Waiting for a review. Thank you!

@eduardscaueru
Copy link
Author

Hey @friedrichg @SungJin1212

Kindly asking if you have any updates regarding the PRs?

Please let me know if there is anything to do from my side. Thanks!

@SungJin1212
Copy link
Contributor

SungJin1212 commented Nov 2, 2024

@eduardscaueru
Sorry for the late reply.
The code itself is fine for me. But I'm still not sure if it would be used generally.
Besides, it seems like it inevitably harms code readability.
@friedrichg
WDYT?

@friedrichg
Copy link
Member

Sorry for the wait. I like #6278 more. Can you rebase it? I'd want some other maintainer to review it too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment