You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In some cases of network segmentation (aka network partition) Ignite is able to find the segmentation happened and will fire a corresponding event. After the network segmentation occurred on a node it becomes unusable it cannot reconnect to topology anymore.
By default Ignite attempts to shutdown the segmented node after this event is fired. However we observed a possibility of node being hanged indefinitely upon such shutdown, while emulating node segmentation due to jvm being stalled. In this case other threads continued working on the node, e.g. Reader component continued to poll kafka for data and attempt to communicate with the Lead. The communication being unsuccessful, as after a segmentation the node leaves the cluster. If such case happens in a production environment the whole cluster will be stalled as the segmented node will continue to consume transactions from kafka and Lead will be unable to see them and continue planning.
Our code should listen for this event, issue some external notification about the network segmentation (e.g. by sending an email) and after that attempt to shutdown the corrupted node in case of network segmentation. If System.exit(...) will not suffice, we may probably try to determine JVM process id and use an OS specific kill command.
Note: the event should be listened for using localListen(...) and the listener will be notified on a segmented node.
The text was updated successfully, but these errors were encountered:
In some cases of network segmentation (aka network partition) Ignite is able to find the segmentation happened and will fire a corresponding event. After the network segmentation occurred on a node it becomes unusable it cannot reconnect to topology anymore.
By default Ignite attempts to shutdown the segmented node after this event is fired. However we observed a possibility of node being hanged indefinitely upon such shutdown, while emulating node segmentation due to jvm being stalled. In this case other threads continued working on the node, e.g. Reader component continued to poll kafka for data and attempt to communicate with the Lead. The communication being unsuccessful, as after a segmentation the node leaves the cluster. If such case happens in a production environment the whole cluster will be stalled as the segmented node will continue to consume transactions from kafka and Lead will be unable to see them and continue planning.
Our code should listen for this event, issue some external notification about the network segmentation (e.g. by sending an email) and after that attempt to shutdown the corrupted node in case of network segmentation. If System.exit(...) will not suffice, we may probably try to determine JVM process id and use an OS specific kill command.
Note: the event should be listened for using localListen(...) and the listener will be notified on a segmented node.
The text was updated successfully, but these errors were encountered: