Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Escalations #2

Open
dominikschulz opened this issue Jun 10, 2015 · 13 comments
Open

Support Escalations #2

dominikschulz opened this issue Jun 10, 2015 · 13 comments

Comments

@dominikschulz
Copy link
Collaborator

Please consider supporting groups of people in which notifications may be escalated.

This would allow several people to be on call for a given topic. If one fails to respond to the notifications the next one would be notified and so on.

It should be pretty easy to implement by adding an team API and wrapping the notification loop in another loop iterating over the team members.

@chrissnell
Copy link
Owner

This is definitely on the to-do list and near the top. Looking at the model, you have a rotation that's defined by:

  • type of shift:
    • full day shift (rotation no sooner than 24 hours after shift start)
      • frequency of rotation (e.g. 1 week)
      • day and time of rotation (e.g. every Monday at 0800)
    • partial day shift (for rotations < 24 hours apart)
      • frequency of rotation (e.g. 12 hours)
    • something else (user A covers weekdays, user B covers weekends)

Separate from the rotation itself, you have an escalation policy that defines what to do if the primary doesn't answer:

  • Escalate to a backup (if so, who?)
  • Page everyone
  • Execute webhook
  • Send email

Also need to figure out when you escalate. This should probably be selectable:

  • After N step of the notification plan fails to get a response
  • After all steps of a notification plan have been executed at least once with no response
  • After N minutes

@dominikschulz
Copy link
Collaborator Author

When talking about on-call rotations and escalations people tend to get really creative. I'd suggest to try to keep things simple and first introduce an team API. A team would have a name and an (ordered) list of members (people).

This team API would provide an notify endpoint like the people API does. Triggering this endpoint would signal each member of the team in order until someone acknowledges the notification (hopefully the first one).

This way teams would still be optional and rather flexible. When implementing https://github.com/dominikschulz/Monitoring-Spooler we chose to not handle automatic shift changes. Instead we let the new operator taking over the shift initiate the change. This way the whole "type of shift" logic would be unnecessary.

@chrissnell
Copy link
Owner

So now we have a Team API. Now I need to figure out how I want to implement the notification engine for teams. It will probably run in its own goroutine similar to StartNotificationEngine() in notification.go. The current StartNotificationEngine() function may get renamed to something like StartPeopleNotificationEngine() and then we have a StartTeamNotificationEngine() which takes notification requests for teams and notifies/escalates as appropriate.

@dominikschulz
Copy link
Collaborator Author

I am currently looking into different ways of implementing the team notifications. I will open up another PR when and if I have something that looks good to me.

@chrissnell
Copy link
Owner

I'm going to work on the implementation tonight but feel free to send PRs if you want. I had a thought last night: the JSON is submitted for team creation is huge and ugly because we are including the rotation and escalation details in there. I think that these should probably be broken into their own separate API (like I did for notification steps). So, we have the Team struct (minus Rotation and EscalationSteps struct members) and then a Rotation struct and an EscalationSteps struct. They could each have their own API if it makes sense. It's a lot of extra code and docs but it may simplify things from the client perspective if we avoid having the client have to create some massive, multi-level JSON struct to set up teams and rotations.

@chrissnell
Copy link
Owner

I think that StartNotificationEngine() could be enhanced to watch for team notifications in addition to the person notifications that it currently gets through planChan. We would add an additional case statement to receive team notification events. When this case was called, it would act similarly to the <-planChan case: it would set up a stopper channel and launch a goroutine to handle the team's notification/escalation plan.

@chrissnell
Copy link
Owner

So I'm in the middle of breaking out escalations and rotations into their own APIs. I will push the new branch work-in-progress to GH shortly. New branch here.

My new branch introduces the concept of an EscalationPlan, which is very similar to a Person's NotificationPlan, except that it applies to teams. One thing we need to think about is the interaction between the execution of the escalation plan and the individual notification plans. Right now, Chicken Little notification plans just execute until acknowledged. Soon, we will add team escalation plans, which constitute a series of individual notification plans. We will need a way for those individual notifications to signal up to the team escalation plan to stop because somebody has acknowledged the alert.

I propose that we assign team alerts a UUID just like we assign them to individual alerts. When a team escalation executes an individual alert as part of the escalation plan, the team alert UUID will be passed as a parameter to the individual alert. If the individual acknowledges their alert, the individual alert executor sends the team alert UUID back to the notification engine so that the team alert can be stopped.

dominikschulz added a commit to dominikschulz/chickenlittle that referenced this issue Jul 1, 2015
As discussed in chrissnell#2 this commit removes the team logic from master
until it's completely finished and gets merged again.
@dominikschulz
Copy link
Collaborator Author

With PR #13 this branch should be more or less feature complete (wrt. team escalations).

@chrissnell While there are still some missing tests and documentation I`d like to hear your opinion on the current state of this branch. Do you miss anything important or what state should we aim for before merging this into master?

@dominikschulz
Copy link
Collaborator Author

@chrissnell With everything I feel necessary in the teamescalation branch I'd like to start discussing what is missing to merge this in to master.

@chrissnell
Copy link
Owner

Sorry it's taken me so long to respond. I've been traveling and then work got busy. If you're ready to merge this, please do--you have commit rights. We do need some documentation, however, on how teams and team escalation works. I also want to starting work on a "Quick Start" guide, but that's not necessary to merge your branch.

@dominikschulz
Copy link
Collaborator Author

Never mind, I just wanted to wait for feedback before merging this into master.

Unless I find any blocking issues in the next days, I'll merge the teamescalation branch to master.

@chrissnell
Copy link
Owner

Hi Dominik, do you still want to merge this?

@dominikschulz
Copy link
Collaborator Author

Hi, I was pretty busy lately. Right now I can't say if I can put any more effort into this project or not.

If it's OK for I'd suggest to leave this PR open until I make up my mind. Otherwise feel free to close it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants