You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have a couple of incidents that the source system cannot close (ie END event). When sending an END event, the event is happily accepted, and nothing happens. Upon further investigation, I tracked it to the following code in EventViewSet.perform_create
try:
self.validate_event_type_for_incident(event_type, incident)
exceptserializers.ValidationErrorase:
# Allow any event from source systems (as long as the user is allowed to post the event type),# even if the posted type is invalid for the incident - like if an `INCIDENT_END` event# is sent after the incident has been manually closedifnotuser.is_source_system:
raisee
In case of an error (ie The incident already has a related event of type 'END'.) the end event is created, but not applied. The associated exception is swallowed. It would be really nice if instead of just dropping the exception, at least a message would be logged.
The incidents that can't be ENDed currently have an end_time == infinity and should be able to be close. Apparently something went wrong during the first END event, although I haven't figured out what yet. Unfortunately our logs don't go back long enough to see what happened the first time the incident was supposed to be ENDed so I can't pinpoint what was the issue there
I think that sending an END event now should still close the incident. Maybe it's better to look at the current end time, and end the incident if end_time==infinity, regardless of whether an END event was previously created. (Perhaps the same holds true for START events, but I'm not sure). Now we've ended up (pun intended) in a situation where an incident cannot be closed by the source system.
So, my suggestions
log a message when an event is accepted but ignored
update the logic on when an end event is excecuted based on its current end_time
The text was updated successfully, but these errors were encountered:
Absolutely there should be logs. I can see several reasons why this happens but we need to log to find others.
A cause I can see: Both END and CLOSED sets end_time. An event can be CLOSED before a source system sets END so then I would say: let the END be created but keep the end_time as is, if it isn't already like that.
Another cause I can see: If end_time is still infinity even if there is an END- or CLOSE-event then we have been bitten by not having been atomic enough (those will be easier once we are rid of the current way of sending notifications. Not this year).
A third cause: race-conditions.
Fix: on a new END-event where there is already an END/CLOSED and end_time is still "infinity":
If there are no REOPEN events, set end_time to the timestamp of the oldest END/CLOSED. If there are REOPEN events, ignore END/CLOSE happening before the latest REOPEN and look at the oldest remaining END/CLOSED. There's hopefully a better way to set a write-lock on the incident-row instead of the "atomic" decorator/context manager.
We could maybe have a data migration to repair existing such in the database, and the actual code doing it should be a utils function so that it can also be run from elsewhere, like a management command or admin.
If however there is already a CLOSED-event, end_time not infinity and another CLOSED-event arrives, that sounds like a bug or a hack-attempt (probably trivial via the admin, it has not been locked down I suspect). Log, including user pk, and toss/raise error.
hmpf
linked a pull request
Nov 15, 2024
that will
close
this issue
We have a couple of incidents that the source system cannot close (ie END event). When sending an END event, the event is happily accepted, and nothing happens. Upon further investigation, I tracked it to the following code in
EventViewSet.perform_create
In case of an error (ie
The incident already has a related event of type 'END'.
) the end event is created, but not applied. The associated exception is swallowed. It would be really nice if instead of just dropping the exception, at least a message would be logged.The incidents that can't be ENDed currently have an
end_time == infinity
and should be able to be close. Apparently something went wrong during the first END event, although I haven't figured out what yet. Unfortunately our logs don't go back long enough to see what happened the first time the incident was supposed to be ENDed so I can't pinpoint what was the issue thereI think that sending an END event now should still close the incident. Maybe it's better to look at the current end time, and end the incident if
end_time==infinity
, regardless of whether an END event was previously created. (Perhaps the same holds true for START events, but I'm not sure). Now we've ended up (pun intended) in a situation where an incident cannot be closed by the source system.So, my suggestions
The text was updated successfully, but these errors were encountered: