Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Notification: Fix incorrectly dropped recovery & ACK notifications #10223

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

yhabteab
Copy link
Member

@yhabteab yhabteab commented Nov 8, 2024

Previously, recovery and ACK notifications were not delivered to users who weren't notified about the problem state while having a configured Problem type filter. However, since the type filter can also be configured on the Notification object level, this resulted to an incorrect behaviour. This PR changes the existing logic so that the recovery and ACK notifications gets dropped only if the Problem filter is configured on both the User and Notification object levels.

Tests

Icinga 2 Config
include <itl>

object CheckerComponent "checker" {}
object NotificationComponent "notification" {}

object NotificationCommand "send" {
	command = ["true"]
}

object Notification "recover" {
	host_name = "test"
	command = "send"
	users = ["icingaadmin"]
	types = [ Recovery ]
}

object Notification "problem" {
	host_name = "test"
	command = "send"
	users = ["icingaadmin"]
	types = [ Problem ]
}

object User "icingaadmin" {
    types = [ Problem, Recovery ]
}

object Host "test" {
	check_command = "dummy"
	max_check_attempts = 1
	check_interval = 10s

	vars.t = get_time()
	var that = this
	vars.dummy_state = () use (that) => {
            if (get_time() > (that.vars.t+30s)) {
                that.vars.t = get_time()

                log("Host recovered")
                return 0
            } else {
                log("Host run into or is in problem state")
                return 2
            }
        }
	vars.dummy_text = "I'm just testing something"
}

Before

~/Workspace/icinga2 (master ✗) prefix/sbin/icinga2 daemon -c icinga2.conf
...
[2024-11-08 15:58:39 +0100] information/config: Host run into or is in problem state
[2024-11-08 15:58:39 +0100] information/Checkable: Checkable 'test' has 2 notification(s). Checking filters for type 'Problem', sends will be logged.
[2024-11-08 15:58:39 +0100] information/Notification: Sending 'Problem' notification 'test!problem' for user 'icingaadmin'
[2024-11-08 15:58:39 +0100] information/Notification: Completed sending 'Problem' notification 'test!problem' for checkable 'test' and user 'icingaadmin' using command 'send'.
[2024-11-08 15:58:49 +0100] information/config: Host run into or is in problem state
[2024-11-08 15:58:59 +0100] information/config: Host run into or is in problem state
[2024-11-08 15:59:09 +0100] information/config: Host run into or is in problem state
[2024-11-08 15:59:19 +0100] information/config: Host recovered
[2024-11-08 15:59:19 +0100] information/Checkable: Checkable 'test' has 2 notification(s). Checking filters for type 'Recovery', sends will be logged.
[2024-11-08 15:59:19 +0100] information/Notification: Notification object 'test!recover': We did not notify user 'icingaadmin' (Problem types enabled) for a problem before. Not sending Recovery notification.
[2024-11-08 15:59:29 +0100] information/config: Host run into or is in problem state
[2024-11-08 15:59:29 +0100] information/Checkable: Checkable 'test' has 2 notification(s). Checking filters for type 'Problem', sends will be logged.
[2024-11-08 15:59:29 +0100] information/Notification: Sending 'Problem' notification 'test!problem' for user 'icingaadmin'
[2024-11-08 15:59:29 +0100] information/Notification: Completed sending 'Problem' notification 'test!problem' for checkable 'test' and user 'icingaadmin' using command 'send'.

After

~/Workspace/icinga2 (fix-recovery-ack-notifications ✗) prefix/sbin/icinga2 daemon -c icinga2.conf
...
[2024-11-08 15:53:42 +0100] information/config: Host run into or is in problem state
[2024-11-08 15:53:42 +0100] information/Checkable: Checkable 'test' has 2 notification(s). Checking filters for type 'Problem', sends will be logged.
[2024-11-08 15:53:42 +0100] information/Notification: Sending 'Problem' notification 'test!problem' for user 'icingaadmin'
[2024-11-08 15:53:42 +0100] information/Notification: Completed sending 'Problem' notification 'test!problem' for checkable 'test' and user 'icingaadmin' using command 'send'.
[2024-11-08 15:53:51 +0100] information/config: Host run into or is in problem state
[2024-11-08 15:54:00 +0100] information/config: Host run into or is in problem state
[2024-11-08 15:54:09 +0100] information/config: Host run into or is in problem state
[2024-11-08 15:54:18 +0100] information/config: Host recovered
[2024-11-08 15:54:18 +0100] information/Checkable: Checkable 'test' has 2 notification(s). Checking filters for type 'Recovery', sends will be logged.
[2024-11-08 15:54:18 +0100] information/Notification: Sending 'Recovery' notification 'test!recover' for user 'icingaadmin'
[2024-11-08 15:54:18 +0100] information/Notification: Completed sending 'Recovery' notification 'test!recover' for checkable 'test' and user 'icingaadmin' using command 'send'.
[2024-11-08 15:54:27 +0100] information/config: Host run into or is in problem state
[2024-11-08 15:54:27 +0100] information/Checkable: Checkable 'test' has 2 notification(s). Checking filters for type 'Problem', sends will be logged.
[2024-11-08 15:54:27 +0100] information/Notification: Sending 'Problem' notification 'test!problem' for user 'icingaadmin'
[2024-11-08 15:54:27 +0100] information/Notification: Completed sending 'Problem' notification 'test!problem' for checkable 'test' and user 'icingaadmin' using command 'send'.
...

fixes #10211

@cla-bot cla-bot bot added the cla/signed label Nov 8, 2024
@icinga-probot icinga-probot bot added bug Something isn't working ref/IP labels Nov 8, 2024
@yhabteab yhabteab added the area/notifications Notification events label Nov 8, 2024
@yhabteab yhabteab added this to the 2.15.0 milestone Nov 8, 2024
@yhabteab yhabteab force-pushed the fix-recovery-ack-notifications branch from 16dedc7 to 557a8af Compare November 8, 2024 15:08
Copy link
Member

@Al2Klimov Al2Klimov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Also, this is a stricter condition for not sending a notification, so we can't loose notifications this way. 👍

However:

@@ -428,9 +428,14 @@ void Notification::BeginExecuteNotification(NotificationType type, const CheckRe
continue;
}

// Verify if the 'Problem' filter is configured at both the User and Notification object levels.
bool foundProblemFilter = NotificationProblem & user->GetTypeFilter() && NotificationProblem & GetTypeFilter();

/* on recovery, check if user was notified before */
if (type == NotificationRecovery) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a point in doing this in advance. What if type doesn't match?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a point in doing this in advance.

The point is to deduplicate the logic, ensuring that changes to one part will automatically be reflected in the other, preventing any oversight.

What if type doesn't match?

And what issue do you see in that case? It's just a simple boolean flag performing bitwise checks, so the type matching shouldn't be relevant.

lib/icinga/notification.cpp Outdated Show resolved Hide resolved
Comment on lines 436 to 440
// Do not send a recovery notification to the current user if he was not previously notified of the
// problem state, while containing the 'Problem' filter at both the user and notification object levels.
if (!notifiedProblemUsers->Contains(userName) && foundProblemFilter) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That comment mostly repeats the condition but doesn't say why. Something like "Don't notify the user about the recovery for a problem they weren't notified about, unless they are explicitly configured to receive recovery notifications but no problem notifications." would be more helpful in my opinion as it gives a bit more context.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unless they are explicitly configured to receive recovery notifications but no problem notifications.

In contrast, this reasoning is not logical to me, as illustrated by the example tests (see PR description), where the user was configured to receive both recovery and problem states.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, maybe it needs a bit more elaboration. What that unless part was supposed to say is that the check only makes sense if it's possible at all for that user to show up in notifiedProblemUsers (which is only the case if both the involved Notification and User object allow problem notifications).

Previously, recovery and ACK notifications were not delivered to users
who weren't notified about the problem state while having a configured
`Problem` type filter. However, since the type filter can also be
configured on the `Notification` object level, this resulted to an
incorrect behaviour. This PR changes the existing logic so that the
recovery and ACK notifications gets dropped only if the `Problem` filter
is configured on both the `User` and `Notification` object levels.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/notifications Notification events bug Something isn't working cla/signed ref/IP
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Different behavior for types = [Recovery] between User and Notification objects
3 participants