[DEV] Handle Ignite network segmentation events #204

YevIgn · 2017-04-26T08:17:30Z

In some cases of network segmentation (aka network partition) Ignite is able to find the segmentation happened and will fire a corresponding event. After the network segmentation occurred on a node it becomes unusable it cannot reconnect to topology anymore.

By default Ignite attempts to shutdown the segmented node after this event is fired. However we observed a possibility of node being hanged indefinitely upon such shutdown, while emulating node segmentation due to jvm being stalled. In this case other threads continued working on the node, e.g. Reader component continued to poll kafka for data and attempt to communicate with the Lead. The communication being unsuccessful, as after a segmentation the node leaves the cluster. If such case happens in a production environment the whole cluster will be stalled as the segmented node will continue to consume transactions from kafka and Lead will be unable to see them and continue planning.

Our code should listen for this event, issue some external notification about the network segmentation (e.g. by sending an email) and after that attempt to shutdown the corrupted node in case of network segmentation. If System.exit(...) will not suffice, we may probably try to determine JVM process id and use an OS specific kill command.

Note: the event should be listened for using localListen(...) and the listener will be notified on a segmented node.

YevIgn added the enhancement label Apr 26, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DEV] Handle Ignite network segmentation events #204

[DEV] Handle Ignite network segmentation events #204

YevIgn commented Apr 26, 2017 •

edited

Loading

[DEV] Handle Ignite network segmentation events #204

[DEV] Handle Ignite network segmentation events #204

Comments

YevIgn commented Apr 26, 2017 • edited Loading

YevIgn commented Apr 26, 2017 •

edited

Loading