-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement cluster membership change support #3
Comments
Hi, I'm really interested in this feature and ready to implement it. Could you please help me with implementation road map? |
Hey, the notification for this caught my eye. I think you'll both be interested in the newly revised cluster membership changes chapter in my thesis: https://ramcloud.stanford.edu/~ongaro/thesis.pdf . That chapter now describes a simplified form of membership changes that's restricted to single-server removals and additions. I'd recommend implementing that version instead of the full-blown joint consensus approach, unless your use case somehow requires arbitrary configuration changes. |
Thanks a lot, Diego. I will read your thesis. |
@drchaos, there's no real 'roadmap' for Kontiki... If you'd like to take a stab at cluster membership support, please do! I have no design in mind for now, so do share your thoughts. |
I am also working on single node member changes. I hope to have something to show by the end of the weekend. I am also going to be adding the states needed to catch up the logs of nodes that are joining. |
I would be joined with pleasure to @sseveran in his work. |
So first a little background on what I am working on. I built a non replicated Chubby last year as a test and implemented a medium sized application using it. The file/sequencer model works great, probably even better in haskell than in c++. I made monad transformer that made it super easy to deal with locks. My long term goal is to support the full chubby feature set as well as larger than memory data using something like either a heavily modified LevelDB or RocksDB or maybe something else. I also want to support the zookeeper wire protocol. I have spoken with several people who would be very interested in something better for use at their companies. Early hackery (not my prototype) is at http:///www.github.com/alphaHeavy/hinu Several generalizations need to be made to kontiki:
I envision a node joining looking like the following:
I have thought a lot less about the removal but here are a few thoughts.
|
After a quick read, that sounds at least a little bit buggy. For example, suppose we have a leader of a 1-server cluster: I don't mean to take the fun out of this, but it might be a good idea to use the membership change algorithm described in my thesis as a baseline, then discuss well-motivated changes and extensions to better suit your needs. |
@ongardie Is the algorithm specified anywhere? Proof language is fine. I started by simply trying to reverse engineer the text. I don't want to do any novel work if I can avoid it. |
@sseveran it's a bit further back in the thread: https://ramcloud.stanford.edu/~ongaro/thesis.pdf |
So I missed a step. For the leader to commit the quorum change into the log it needs to involve the new quorum, not the old quorum. So both Followers and our NonVotingFollower would need to vote to commit it. The NonVotingFollower logic would be the same except it would begin voting in the round where it sees the cluster configuration being modified. A couple more thoughts on the boxes on page 54. For number 2 in SetConfiguration RPC I am not sure a specified number of rounds is right since our database may be arbitrarily large. The number of rounds would could be dependent on server load. My thought was to have an adjustable timeout for this. For instance we may wish to join a replica from a remote data center which may have very different throughput for replicating the existing state. My larger question about both those boxes is who calls them. I was anticipating that the node would call the leader when he starts up. I don't think I have missed anything that would make this impossible but I may have. @ongardie Also don't worry about making fun. I take correctness here very seriously. So if you have any more comments or thoughts please share them. |
@sseveran Cool. You can definitely have servers automatically call in to add themselves to the cluster, it just depends on how you want the system to work. It's a bit more troubling if the system could automatically reduce its size when servers fail, though, since it would also be reducing its fault tolerance. |
Some thoughts: Re 'Config needs to be updatable': I think this whole cluster membership change can be implemented on top of the existing API. The existing API is polymorphic over the type of values for which consensus is reached, so a the whole cluster management stuff could be implemented using a type like data Entry a = ConfigurationChange Config | UserEntry a Re 'runTransitionT needs to return an updated config': when using the system as depicted above, there's no need for this, and some library code which offers configuration update mechanisms can do this by interpreting the current output of Re 'MonadLog should support snapshoting and log replay': I'd rather keep Kontiki abstract of an actual log implementation, hence I don't see why snapshotting and replay should be part of it? Re 'We need a new non-voting follower state': yeah, that's a great idea, similar to Paxos' learner nodes. We added a similar feature to Arakoon in order to increase reliability without incurring more latency impact. Re 'The node configuration will be stored at a special reserved key in the transaction log': If the polymorphism of what goes in the log is used, this could be in the cluster-membership-as-a-library code as well, without any 'special key'. Actually, a log doesn't have any keys at all, isn't it? Re 'We may want the network state to be parameterized so applications can supply their own types for addressing e.g. IPV4 vs IPV6': I think any network state should be kept and managed by the application, not the library -> there are tons of different use-cases and setups which one can't foresee. You mention the IP version, another example (also from Arakoon) is multiple addresses per cluster node (and only one is used for message transport when using TCP, whichever is available),... IMHO a cluster node should be only specified by its unique identifier. How this is mapped to some network address is up to the application, and could (in case of a key-value application) be kept using some specific key. Re 'There needs to be a way of marking a node as down. This should probably be a function the user passes in. In the UDP example it might be if the queue gets too big.': I'm afraid I don't understand the rationale. Why would this be required? IMHO removing a node should be done on the application level, or (more likely) at some higher-level management layer. Re 'We may want to have a maintenance mode so we can continue to buffer entries for a node for much longer to accommodate things like OS upgrades. For now its probably fine to just say that is supported by removing a node, performing maintenance and then rejoining it.': I'm note sure I follow here either... The system as-is should handle node outages/network splits/... gracefully, and an OS upgrade is as much a node outage as any other. Basically, through experience with Arakoon, I think it's important Kontiki retains the following principles:
Note I didn't read @ongardie's thesis (sorry Diego, should do that sometime soon!), so I'm not familiar with the config change system being discussed yet. I'm familiar with the original 2 stage, 2 quorum mechanism and designed Kontiki in order for that to be possible, given the type of |
It'd be useful to implement the 'correct' protocol to change cluster membership/topologies, as discissed (IIRC) in the Raft paper.
The text was updated successfully, but these errors were encountered: