You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It looks like if a slave dies, some % of slave requests will simply fail. Feature request: support failure detection, and some kind of circuit-breaker recovery mechanism.
The text was updated successfully, but these errors were encountered:
There are two kinds of failure in this scenario: transient and permanent.
When a slave fails permanently, the infrastructure shuffling that follows is really coupled to each organisation.
One could have VIPs in front of each slave and keep the reshuffling under the application layer but I don't think most solutions are that advanced yet. If the master fails, things get even dirtier with manual or automatic failover, this means that what previously was a slave could now be a master which just breaks all assumptions in this library. With so many fragilities, this sort of dynamic reconfiguration is perhaps better handled by the user of the library by re-instantiating the DB object.
With transient failures the story is different as they could originate from a rather large scope of events such as:
Network connectivity loss
Network connectivity degradation
Load spikes
Slow queries
Replication lag
Distinction between each of this is hard or impossible from the application layer. How do you envision a circuit breaking mechanism with such a variate array of failure modes and coarse detection capabilities? Detection and more generally, infrastructure state introspection, is an orthogonal concern of each organisation and I don't see how I could account for these in a general way. I am very happy to discuss it though!
It looks like if a slave dies, some % of slave requests will simply fail. Feature request: support failure detection, and some kind of circuit-breaker recovery mechanism.
The text was updated successfully, but these errors were encountered: