Replies: 2 comments
-
Hi, thanks for joining the community sync earlier where we were able to get a head start discussing this feature. Your understanding is correct. Currently most of our layout features are just "feature flags" that block requests until finalization, so doing them all in one request has not been an issue, but it would be good to have them separated out to make the framework more robust. When implementing SCM HA finalization I actually decided to do it this way from the start, so that code can serve as a reference. The jira with an attached design doc is HDDS-5141. SCM finalization was more complicated because SCM must also manage the datanodes. Unfortunately it currently requires closing all pipelines, although it would be good to get rid of that requirement at a later point. Anyways there's a lot of complexity in that doc that can be ignored, including all the checkpoint stuff. A summary of how we could implement this is below: When the finalize request is received, it should write a key to RocksDB indicating finalization is in progress, then return to the client. A thread in the leader will be started that will send finalize requests one by one for each unfinalized layout feature. Once all features are finalized, the "finalization in progress" key can be removed from the DB. The extra DB key helps us continue the finalization process during restart and leader changes. If there is a leader change during finalization, the new leader will see the key and can resume from where the old leader left off. The metadata layout version and the finalizing key are written to the DB through Ratis as part of the same RocksDB write batch so any node elected leader will have a consistent view of the progress. In case Ratis (our raft implementation) needs to catch up a follower by having them install a snapshot, the follower should be able to run its finalization actions when seeing the new layout version in the DB. This code already exists and probably will not need to be changed as part of this feature. Here's some starting points in the code:
I was just refreshing my memory on this myself, so this is not a complete list of the implementation details. Feel free to ask questions as needed and request a Jira account if you have not already to assign the task to yourself if you want to work on it. |
Beta Was this translation helpful? Give feedback.
-
Hello, Apache Community!
I would like to ask you a question about the task HDDS-4286.
In existing implementation (in
master
)When Ozone Client sends
OMFinalizeUpgradeRequest
each Ozone Manager receives it. OM leader handles this request and writes message to Ratis log. OM followers do nothing but write DEBUG message to (console or file) log. When message appears in Ratis log all OMs apply it in order to finalize all layout features at once.Behavior proposed by HDDS-4286
Am I right that mentioned task purposes to split
OMFinalizeUpgradeRequest
to finalize each layout feature separately? So we will have as many messages in Ratis log as amount of layout featuresSupposed implementation
OMFinalizeUpgradeRequest
OzoneManagerRatisServer
split this request into separate requests (one to finalize each layout feature) and submit them to RatisvalidateAndUpdateCache
function.Beta Was this translation helpful? Give feedback.
All reactions