-
Notifications
You must be signed in to change notification settings - Fork 522
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
metrics-generator excessive err-mimir-sample-duplicate-timestamp #4201
Comments
Hi, agree this shouldn't be happening. We add a unique label Can you check some things?
Thanks! Our current theory is that this might be due to how we are injecting a 0 sample when adding new histogram series. Whenever we remote write a serie for the first time, we inject a 0 sample right before appending the actual value to clearly mark is as going from 0 to the new value: tempo/modules/generator/registry/histogram.go Lines 190 to 199 in 1652db2
This code adds a sample with value We have similar code in counter. But instead of injecting a sample 1ms earlier, we delay the next samples by 1s: tempo/modules/generator/registry/counter.go Lines 157 to 169 in 1652db2
So if you only see histograms causing the duplicate sample errors, that is a clear indicator something in that implementation is not right. |
Hi and thanks for your detailed analysis!
|
@kvrhdn I also tried enabling out-of-order sample ingestion, but this had no effect at all. We are still getting the same amount of |
Do you maybe have any relabel configs on your remote write? Maybe you are dropping labels that would make series unique. Or do you maybe have multiple sources of these metrics? (e.g. you are sending from both Tempo and Alloy)
We did some more investigation in this and it shouldn't be the issue. The errors we were seeing somewhere else were related to aggregation. |
The only remote-write config we have I posted in our config above. To my knowledge this should just rename the metrics and not cause any loss in uniqueness: storage:
path: /var/tempo/wal
remote_write:
- send_exemplars: true
url: ...
write_relabel_configs:
- regex: ^(.+)$
source_labels:
- http_method
target_label: http_request_method
- regex: ^(.+)$
source_labels:
- http_status_code
target_label: http_response_status_code
- action: labeldrop
regex: ^http_method|http_status_code$ |
@kvrhdn it looks like removing the above mentioned Any idea what is causing issues with this relabel configs? As I mentioned, I just want to rename The ingested metrics look exactly like we would expect: With relabel config:
Without relabel config:
|
Nice find! Yeah, I'm not sure why this relabel config is causing issues (but I'm also not an expert on this). Maybe Metrics-generator also has support built in to remap dimensions: it's the I think this should work: metrics_generator:
processor:
span_metrics:
dimension_mappings:
- name: http_request_method
source_labels: http.method
join: ''
- name: http_response_status_code
source_labels: http.status_code
join: '' Note that |
Describe the bug
Hello,
We are using the Tempo metrics-generator to generate span metrics from traces.
In general this works, however our metrics-generator is throwing lots of
err-mimir-sample-duplicate-timestamp
errors in the logs.The error is thrown on average about 250 times per minute:
Some sample log lines:
In our infrastructure this seems to be mostly coming from metrics generated from auto-instrumented nodejs services, however this might be the case for other services as well.
Expected behavior
err-mimir-sample-duplicate-timestamp
should not be thrown regularlyEnvironment:
Additional Context
metrics-generator config:
Thanks a lot for your help!
The text was updated successfully, but these errors were encountered: