Icinga 2 API as Event Source #112

oxzi · 2023-10-19T11:32:32Z

Fetch events from the Icinga 2 API. For details, consider the commit messages.

Closes #89.
refs Icinga/icinga-notifications-web#156 (corresponding configuration interface)

julianbrost

Comments from the first pass over the code. In general, I will need to have a closer look over that event deduplication and whether we want change that in general so that it would be able to generate events for while the notifications daemon wasn't running (but those already get lost with the current hacky approach, so no downgrade in that respect).

internal/eventstream/api_responses_test.go

cmd/icinga-notifications-daemon/main.go

internal/eventstream/client_es.go

internal/eventstream/client_api.go

internal/eventstream/client.go

internal/eventstream/api_responses.go

julianbrost · 2023-10-26T10:09:03Z

From what I can tell, you only try to remove duplicates but there's nothing that would prevent reordering?

Also, looks like you're using the timestamps from Icinga 2 for the events. So far, all events receive their timestamp from the notifications daemon:

icinga-notifications/internal/listener/listener.go

Line 68 in 4044b18

ev.Time = time.Now()

I think I'd keep it that way for now, after all, we can't send notifications in the past. (Maybe, as a future extension, we might want to track both source and processing timestamps and show hints that events were delayed.)

Anyways, I have a suggestion that shouldn't be too difficult to implement and would also handle events that were missed due to a notifications daemon restart:

Reconnect the event stream, once established start buffering events (no processing yet)
Query all state from the objects API and process events accordingly (as a first step, processing an event for every object, but there's the obvious improvement of using the current incident state (which will be loaded on daemon startup as part of Trigger time based escalations #68) and only emitting events where there was a change), storing the last timestamp per object.
Start processing the buffered events from step 1 as well as new ones, ignoring those older than the timestamps from step 2.

oxzi · 2023-10-26T15:30:22Z

@julianbrost: I did an implementation with some necessary refactoring in the three just pushed commits. Please feel free to take a look.

While working on #112, I experienced some delay when replaying the whole Icinga 2 Objects API state after an Event Stream connection loss. After taking some pprof snapshots, some big unlocks occurred. Thus, I minimized the scopes or code areas being mutex protected. Especially in the GetCurrent function creates a long Lock even over SQL queries. However, in most experimental sessions the locks were insignificant, while the SQL internals required huge time slots.

internal/eventstream/client_api.go

internal/eventstream/client.go

internal/eventstream/api_responses.go

internal/eventstream/client.go

Switch to a signal based context which is being canceled for SIGINT and SIGTERM. This change was extracted from the ongoing PR #112.

internal/eventstream/client.go

Switch to a signal based context which is being canceled for SIGINT and SIGTERM. This change was extracted from the ongoing PR #112.

julianbrost

See inline comments for some details. On a higher level, I'd consider changing two things:

Currently, states and acknowledgements are fetched completely independently. This means those could be out of sync (you might acknowledge too early or too late in relation to the state events). So this should be more consistent if there was one request to /v1/objects/hosts and one to /v1/objects/services which is used as the baseline for everything (severity, acknowledgement, whatever we might add in the future). You emit a state event for that state and if it is flagged as acknowledged, you fetch the necessary extra information and emit another event for that as well (after the state event).
In eventDispatcher(), I find the use of replayTrigger and replayPhase somewhat confusing. Maybe that would be simpler if there was one call to eventDispatcher() per event stream request, i.e. it would only handle one replay phase overall (but you'd probably have to be careful to only start the next one after the previous is gone for sure and won't submit further events for processing). Other ideas I'd consider: use only channels for signalling, i.e. replace replayPhase with one and maybe close eventDispatcherReplay to signal that the replay is done.

internal/eventstream/client_api.go

internal/eventstream/client.go

oxzi · 2023-11-03T16:13:01Z

I addressed the annotated comments within the code. However, the two main bullet points are still open.
I'll request another review when I worked on those.

internal/eventstream/client.go

By introducing ErrSuperfluousStateChange to signal superfluous state changes and returning a wrapped error, those messages can now be suppressed (logged with the debug level) for Event Stream processing.

After #132 got merged and each Source's state is now within the database, the Event Stream's configuration could go there, too. This resulted in some level of refactoring as the data flow logic was now reversed at some points. Especially Golang's non-cyclic imports and the omnipresence of the RuntimeConfig made the "hack" of the eventstream.Launcher necessary to not have an importing edge from config to eventstream.

This refactoring converges this representation of the Icinga 2 API Unix timestamp to those of Icinga DB. Eventually, those common or similar code will be extracted into the icinga-go-library.

Some values are returned as constant integer values, e.g., 0 for an OK state service. Those known integers were now replaced by consts.

The catch-up-phase logic was extended to also propagate back an error if the state was left unsuccessfully. In this case - and not if another catch-up-phase was requested and the old phase worker got canceled - another attempt will be made.

julianbrost · 2024-01-16T13:41:41Z

After being asked something about the config, I noticed that it would probably be a good idea to have a separate config attribute for the Icinga 2 endpoint name, i.e. the name expected in the certificate. Given that both are configured separately in Icinga 2 (and you have to specify both, one isn't implicitly used for the other), it's quite possible to have configs like this which wouldn't be possible with the current implementation here without disabling certificate validation altogether:

object Endpoint "master" {
  host = "icinga.example.com"
  port = 5665
}

icinga2_common_name, or Icinga2CommonName in Go, allows overriding the expected Common Name of the Certificate from the Icinga 2 API. For testing, I acquired the CA's PEM by: > openssl s_client \ > -connect docker-master:5665 \ > -showcerts < /dev/null 2> /dev/null \ > | awk '/BEGIN CERTIFICATE/ || p { p = 1; print } /END CERTIFICATE/ { exit }' and populated the source table as follows: > UPDATE source SET > icinga2_ca_pem = $$-----BEGIN CERTIFICATE----- > [ . . . ] > -----END CERTIFICATE-----$$, > icinga2_common_name = 'docker-master', > icinga2_insecure_tls = 'n'; Afterwards, one can verify the check by altering icinga2_common_name either to NULL or an invalid common name.

oxzi · 2024-01-17T10:32:29Z

@julianbrost: A distinct Common Name was implemented in 24a4843, with additional information regarding adjusting one's testing environment in the commit message.

schema/pgsql/schema.sql

internal/config/source.go

internal/icinga2/client.go

internal/config/source.go

internal/icinga2/client.go

julianbrost · 2024-03-21T09:18:30Z

I noticed a small detail that would be pretty helpful: setting the user agent in all the requests. Icinga 2 logs the user agent for each request, so it would be helpful to easily identify the requests in these logs.

- Rate limit catch-up-phase worker start. In case of a network disruption during the catch-up-phase, this will result in an error and infinite retries. Those, however, might result in lots of useless logging, which can be rate limited. - Remove the both useless and broken catchupEventCh drainage logic. All sends are being protected by context checks. - Abort early on errors received from the catchupEventCh and don't store them for later.

julianbrost

Took quite some time (where I have a good part in 😅), now it LGTM (you've got to put this in a review for a PR of this size).

My tests¹ were successful, besides that I found #171, but that's not caused by this PR, it affects its functionality though. I didn't test every last detail, the big picture is working, so if there's something left, that would be for another PR, this one is sitting around here for long enough now.

different configs in the source table (including certificate validation), handling of Icinga 2 restart, state and ack events during catch-up and normal operation ↩

cla-bot bot added the cla/signed CLA is signed by all contributors of a PR label Oct 19, 2023

oxzi changed the title ~~Icinga2 source~~ Icinga 2 API as Event Source Oct 19, 2023

oxzi force-pushed the icinga2-source branch from e8aceea to 1ff68a8 Compare October 19, 2023 14:29

julianbrost reviewed Oct 19, 2023

View reviewed changes

oxzi force-pushed the icinga2-source branch from 1ff68a8 to bcbbbc8 Compare October 23, 2023 08:56

oxzi requested a review from julianbrost October 23, 2023 15:14

oxzi force-pushed the icinga2-source branch from 1b2d619 to 1a0a02c Compare October 25, 2023 12:11

oxzi mentioned this pull request Oct 27, 2023

Narrow down Mutex locked scope #120

Closed

julianbrost requested changes Oct 27, 2023

View reviewed changes

internal/eventstream/client_api.go Outdated Show resolved Hide resolved

internal/eventstream/client.go Outdated Show resolved Hide resolved

internal/eventstream/client.go Outdated Show resolved Hide resolved

oxzi force-pushed the icinga2-source branch from 1fc76d6 to 89846ac Compare October 30, 2023 14:53

oxzi requested a review from julianbrost October 30, 2023 14:55

oxzi force-pushed the icinga2-source branch from 89846ac to d51bcdb Compare October 30, 2023 15:42

julianbrost requested changes Oct 31, 2023

View reviewed changes

internal/eventstream/api_responses.go Outdated Show resolved Hide resolved

internal/eventstream/client.go Outdated Show resolved Hide resolved

internal/eventstream/client.go Outdated Show resolved Hide resolved

internal/eventstream/client.go Outdated Show resolved Hide resolved

oxzi force-pushed the icinga2-source branch from d51bcdb to 26a3295 Compare October 31, 2023 15:43

oxzi added a commit that referenced this pull request Nov 2, 2023

icinga-notifications-daemon: handle SIGINT/SIGTERM

d3fb6ab

Switch to a signal based context which is being canceled for SIGINT and SIGTERM. This change was extracted from the ongoing PR #112.

oxzi mentioned this pull request Nov 2, 2023

Shutdown Context and Fixes #122

Merged

julianbrost reviewed Nov 2, 2023

View reviewed changes

internal/eventstream/client.go Outdated Show resolved Hide resolved

oxzi requested a review from julianbrost November 2, 2023 15:35

julianbrost pushed a commit that referenced this pull request Nov 3, 2023

icinga-notifications-daemon: handle SIGINT/SIGTERM

72cb421

Switch to a signal based context which is being canceled for SIGINT and SIGTERM. This change was extracted from the ongoing PR #112.

oxzi force-pushed the icinga2-source branch from 45f533a to f12e4ff Compare November 3, 2023 10:06

julianbrost requested changes Nov 3, 2023

View reviewed changes

oxzi requested a review from julianbrost November 6, 2023 13:11

oxzi mentioned this pull request Nov 6, 2023

Icingaweb2 can't handle properly escaped URIs Icinga/icingaweb2#3239

Closed

oxzi force-pushed the icinga2-source branch 2 times, most recently from 906310b to 500b2a9 Compare November 7, 2023 10:32

julianbrost requested changes Nov 7, 2023

View reviewed changes

internal/eventstream/client.go Outdated Show resolved Hide resolved

internal/eventstream/client.go Outdated Show resolved Hide resolved

internal/eventstream/client.go Outdated Show resolved Hide resolved

oxzi added 9 commits January 12, 2024 10:38

incident: custom superfluous state change error

e0b4144

By introducing ErrSuperfluousStateChange to signal superfluous state changes and returning a wrapped error, those messages can now be suppressed (logged with the debug level) for Event Stream processing.

eventstream: use unified logger name with field

e1452dd

icinga2: rename eventstream package to icinga2

902cdd7

icinga2: rework Icinga2Time to UnixFloat

425a6a0

This refactoring converges this representation of the Icinga 2 API Unix timestamp to those of Icinga DB. Eventually, those common or similar code will be extracted into the icinga-go-library.

icinga2: consts for numeric Icinga 2 API results

2f534b4

Some values are returned as constant integer values, e.g., 0 for an OK state service. Those known integers were now replaced by consts.

icinga2: only process HARD state changes

86cc113

icinga2: rename integer consts to Go-like names

f66f726

icinga2: restart catch-up-phase on error

d6c4d36

The catch-up-phase logic was extended to also propagate back an error if the state was left unsuccessfully. In this case - and not if another catch-up-phase was requested and the old phase worker got canceled - another attempt will be made.

oxzi force-pushed the icinga2-source branch from bb764bd to d6c4d36 Compare January 12, 2024 09:38

oxzi mentioned this pull request Jan 17, 2024

Add source configuration UI Icinga/icinga-notifications-web#156

Merged

nilmerg reviewed Jan 17, 2024

View reviewed changes

schema/pgsql/schema.sql Show resolved Hide resolved

julianbrost requested changes Mar 14, 2024

View reviewed changes

internal/config/source.go Outdated Show resolved Hide resolved

internal/icinga2/client.go Outdated Show resolved Hide resolved

internal/icinga2/client.go Outdated Show resolved Hide resolved

oxzi requested a review from julianbrost March 19, 2024 09:57

oxzi force-pushed the icinga2-source branch from fc0236c to c567066 Compare March 19, 2024 10:03

julianbrost requested changes Mar 19, 2024

View reviewed changes

internal/config/source.go Outdated Show resolved Hide resolved

internal/icinga2/client.go Outdated Show resolved Hide resolved

internal/icinga2/client.go Outdated Show resolved Hide resolved

internal/icinga2/client.go Outdated Show resolved Hide resolved

oxzi added 3 commits March 21, 2024 10:22

config: Fix {bool,string}Eq to compare Sources

00e4a8d

icinga2: Rework catch-up-worker processing

786d287

oxzi force-pushed the icinga2-source branch from c567066 to 786d287 Compare March 21, 2024 09:23

icinga2: Custom http.Transport to set User-Agent

4673b2c

oxzi requested a review from julianbrost March 21, 2024 10:24

julianbrost approved these changes Apr 12, 2024

View reviewed changes

julianbrost merged commit 97e0114 into main Apr 12, 2024
4 checks passed

julianbrost deleted the icinga2-source branch April 12, 2024 14:33

yhabteab mentioned this pull request May 6, 2024

Send non-state notifications for incident #146

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Icinga 2 API as Event Source #112

Icinga 2 API as Event Source #112

oxzi commented Oct 19, 2023 •

edited by julianbrost

Loading

julianbrost left a comment

julianbrost commented Oct 26, 2023

oxzi commented Oct 26, 2023

julianbrost left a comment

oxzi commented Nov 3, 2023

julianbrost commented Jan 16, 2024

oxzi commented Jan 17, 2024

julianbrost commented Mar 21, 2024

julianbrost left a comment

Icinga 2 API as Event Source #112

Icinga 2 API as Event Source #112

Conversation

oxzi commented Oct 19, 2023 • edited by julianbrost Loading

julianbrost left a comment

Choose a reason for hiding this comment

julianbrost commented Oct 26, 2023

oxzi commented Oct 26, 2023

julianbrost left a comment

Choose a reason for hiding this comment

oxzi commented Nov 3, 2023

julianbrost commented Jan 16, 2024

oxzi commented Jan 17, 2024

julianbrost commented Mar 21, 2024

julianbrost left a comment

Choose a reason for hiding this comment

Footnotes

oxzi commented Oct 19, 2023 •

edited by julianbrost

Loading