You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have read the latest update about Prometheus Version: 2.43 and noticed the new feature to fix the Prometheus send resolved issue, but it is not optimized enough. The new feature is as follows:
How long an alert will continue firing after the condition that triggered it
has cleared.
[ keep_firing_for: | default = 0s ]
It still can't solve the metric loss problem. I have an idea to solve this problem and I recommend you to try:
keep_firing_expr:
For example:
alert:DiskSpaceUsage95
expr:(100 - 100 * ((windows_logical_disk_free_bytes{volume="(^[A-Z]:$)"} / windows_logical_disk_size_bytes{volume="(^[A-Z]:$)"}))) > 95
for: 1m
keep_firing_expr: up{ $labels.instance } == 0
labels:
severity: warning
annotations:
description: Disk Space on Drive is used more than 90% VALUE = {{ $value }}
summary: Disk Space Usage (instance {{ $labels.hostname }})
With this addition, the alert will only be resolved when up{ $labels.instance } == 1, and if up{ $labels.instance } == 0, the alert will continue to be triggered until the metrics are back.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Dear Prometheus development team,
I have read the latest update about Prometheus Version: 2.43 and noticed the new feature to fix the Prometheus send resolved issue, but it is not optimized enough. The new feature is as follows:
How long an alert will continue firing after the condition that triggered it
has cleared.
[ keep_firing_for: | default = 0s ]
It still can't solve the metric loss problem. I have an idea to solve this problem and I recommend you to try:
keep_firing_expr:
For example:
alert:DiskSpaceUsage95
expr:(100 - 100 * ((windows_logical_disk_free_bytes{volume="(^[A-Z]:$)"} / windows_logical_disk_size_bytes{volume="(^[A-Z]:$)"}))) > 95
for: 1m
keep_firing_expr: up{ $labels.instance } == 0
labels:
severity: warning
annotations:
description: Disk Space on Drive is used more than 90% VALUE = {{ $value }}
summary: Disk Space Usage (instance {{ $labels.hostname }})
With this addition, the alert will only be resolved when up{ $labels.instance } == 1, and if up{ $labels.instance } == 0, the alert will continue to be triggered until the metrics are back.
What do you think about this idea?
Beta Was this translation helpful? Give feedback.
All reactions