Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reporting errors in Argus itself #88

Open
hmpf opened this issue Aug 3, 2020 · 6 comments
Open

Reporting errors in Argus itself #88

hmpf opened this issue Aug 3, 2020 · 6 comments
Labels
blocked Another thing/issue has to be resolved before tackling this discussion Requires developer feedback/discussion before implementation notification Affects the notification system priority: low
Milestone

Comments

@hmpf
Copy link
Contributor

hmpf commented Aug 3, 2020

The types of errors that argus can report about itself, for instance: Failure to send a notification because the notification-endpoint isn't answering (email server down, say), should be reported as a incident.

This means we need a SourceType "argus" and a SourceSystem representing the host argus is running on. Named "self" maybe? "me"? I suspect hostname would be tricky. Also, a function/method argus can use to write to the incidents-table, with SourceType/SourceSystem locked.

(This is very nice, because we can dogfood the system using itself, triggering errors in argus in order to have incidents turn up in argus :) )

@hmpf hmpf added the discussion Requires developer feedback/discussion before implementation label Aug 3, 2020
@hmpf
Copy link
Contributor Author

hmpf commented Aug 3, 2020

These objects should be get_or_created early and easily and often. Maybe a management command named "setup" or "verify" or something, or get_or_created on each use of the function.

The first use of the function could be when setting up the system for the first time, a "Hello, World" incident, low severity!

@hmpf hmpf added this to the blue sky milestone Sep 29, 2020
@hmpf hmpf added the notification Affects the notification system label Sep 29, 2020
@hmpf
Copy link
Contributor Author

hmpf commented Oct 26, 2020

There now is a way to auto-create an argus user/source/source type. What's left is to create a suitable incident every time argus complains about something in its logs.

@hmpf hmpf added the good first issue Good for newcomers label Oct 26, 2020
@katsel
Copy link
Contributor

katsel commented Jan 18, 2021

This feature sounds very useful!

Still, I am a bit worried that, in certain cases, Argus might overload with its own error messages if this were implemented naively.
Where an error triggers an incident, which matches a filter, is sent out by mail, which causes another error that triggers another incident, ad infinitum: Congrats on DoSing yourself and/or taking a whole Argus instance down.

So, two requirements that should be met before implementation

  1. Needs a clear, exhaustive, written spec which errors can cause incidents and which do not.
  2. A mechanism to prevent choking on its own incidents.
    Some filtering message queue, or another mechanism for rate limiting.

@katsel katsel removed the good first issue Good for newcomers label Jan 18, 2021
@katsel
Copy link
Contributor

katsel commented Jan 18, 2021

Removing "good first issue" tag for aforementioned issues.
The actual code change may be easy to make, but it seems wise to reduce the threat vector a bit before tackling an implementation.

@katsel katsel added the blocked Another thing/issue has to be resolved before tackling this label Jan 18, 2021
@katsel
Copy link
Contributor

katsel commented Jan 19, 2021

  • Audit logs are another source of debugging information, so this one is low-priority.

Another approach would be sending a notification through Argus without creating an incident.
Details to be discussed later.

@johannaengland
Copy link
Contributor

This came up again in #760.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked Another thing/issue has to be resolved before tackling this discussion Requires developer feedback/discussion before implementation notification Affects the notification system priority: low
Projects
None yet
Development

No branches or pull requests

3 participants