Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make metrics from carbon-cache go via relay config #959

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

bucko909
Copy link
Contributor

This is necessary for any hashed sharding setup: The cache will write metrics to its local cache, not to the correct cache for the hashing scheme. This means only datapoints synced to disk are available. Worse, in a highly available setup, metrics differ depending on which node you happen to be fetching from.

@bucko909
Copy link
Contributor Author

bucko909 commented Aug 21, 2024

This obviously isn't an ideal patch, as it requires a valid relay config to exist in the cache. Our solution is to set up the relay to submit to the HA router port, hence the ConstantRouter in the solution (I didn't want to mess with the hashing configuration for the HA setup). This is obviously optional, but might be useful for other reasons.

Probably this should be a config option, something like CACHE_METRICS_VIA_RELAY = False.

@bucko909
Copy link
Contributor Author

Update just splits out the ConstantRouter.

@deniszh
Copy link
Member

deniszh commented Aug 24, 2024

@bucko909 : tests should be amended for changed behaviour, could you please amend it? Thanks!

@bucko909
Copy link
Contributor Author

I can do, but I think I need agreement on what the changed behaviour should be -- what's here cannot be shipped as it'll break existing configs.

"Proper" approaches are:

  • Simply to require a default-off config setting to enable this behaviour, though it'll then remain as a trap for anyone who tries to set up a HA setup and doesn't thoroughly read the config file. Perhaps it can emit a warning if there are multiple instances configured but this setting is turned off?
  • Have the cache read the relay section of the config file and copy the relay's submission behaviour. Obviously this is quite confusing, as the cache won't work without the relay configured, and it'll probably break single-pipeline setups (which don't need this anyway).

I'll submit a patch with the former behaviour and a warning for now; you can decide if that's appropriate. The tests will then work fine if I do this, though you may also want to require a test for the new behaviour. I'll see if I can get one to work.

@codecov-commenter
Copy link

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 60.00000% with 8 lines in your changes missing coverage. Please review.

Project coverage is 50.67%. Comparing base (fdc56f6) to head (e50d01a).
Report is 17 commits behind head on master.

Files with missing lines Patch % Lines
lib/carbon/instrumentation.py 0.00% 3 Missing ⚠️
lib/carbon/routers.py 80.00% 3 Missing ⚠️
lib/carbon/service.py 0.00% 1 Missing and 1 partial ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #959      +/-   ##
==========================================
+ Coverage   50.63%   50.67%   +0.03%     
==========================================
  Files          36       36              
  Lines        3446     3467      +21     
  Branches      535      528       -7     
==========================================
+ Hits         1745     1757      +12     
- Misses       1574     1582       +8     
- Partials      127      128       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@bucko909
Copy link
Contributor Author

I've added the setting and some warnings in the config file. I'm not sure it'd be possible to add a warning about configuration easily, because one daemon needs to warn depending on another's config, which isn't really in the design spec of the configuration's loader.

Some caveats:

  • This is untested in its current configuration, though hopefully clearly won't break any existing installation. I'll be back at work next week and can probably test this more thoroughly on one of our live deployments then, though I've had an extended absence so it's unlikely to be my first priority.
  • The test for ConstantRouter is a bit of a hack. I think realistically I should probably just remove it -- it's equivalent to any hashing approach with REPLICATION_FACTOR = len(DESTINATIONS) anyway. I think my adding it was simply misguided.

Let me know if you think this is reasonable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants