-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-28951 rename the wal before retrying the wal-split with another worker #6534
base: master
Are you sure you want to change the base?
Changes from all commits
71d43fb
56423a7
cb375de
e5f51ea
398bc1f
dd576cd
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -234,9 +234,16 @@ static void requestLogRoll(final WAL wal) { | |
static final String DEFAULT_PROVIDER_ID = "default"; | ||
|
||
// Implementation details that currently leak in tests or elsewhere follow | ||
/** File Extension used while splitting an WAL into regions (HBASE-2312) */ | ||
/** | ||
* File Extension used while splitting an WAL into regions (HBASE-2312) This is used with the | ||
* directory name/path | ||
*/ | ||
public static final String SPLITTING_EXT = "-splitting"; | ||
|
||
// Extension for the WAL where the split failed on one worker and is being retried on another. | ||
// this is used with the WAL file itself | ||
public static final String RETRYING_EXT = ".retrying"; | ||
Umeshkumar9414 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
/** | ||
* Pattern used to validate a WAL file name see {@link #validateWALFilename(String)} for | ||
* description. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. while splitting the wal for meta table. wal name can be rs.XXX.meta.retrying001. Do you think we should update the WAL_FILE_NAME_PATTERN. Althought in splitting we didn't check for valid wal name. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Apache9 what do you think? |
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, I'm not terribly familiar with wal split.
Does the WAL file get closed by this time? I'm asking because Ozone doesn't yet support renaming open files. And supporting that is quite a big project itself.
Even thought that's not yet a huge problem for HBase since HBase isn't default to run on Ozone, it would be great if we don't attempt to rename open files.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this time there are cases when the WALFile will still be open. In current code we recoverLease at RS once master assign the splitting to the worker RS. @Apache9 do you think we should move the recoverLease to Master ?
Before this rename we also rename the WALdirectory for the rs. @jojochuang is renaming directory is different from file renaming ? If directory rename is also not supported when some file inside the directory is open then we need changes in current code as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should have been done before we rename the wal directory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks guys.
I think what's more relevant for HBase is that it used to cause race conditions if the WAL files are kept open while being renamed. HBASE-27732 fixed one such bug -- because HDFS allows renaming open files, it doesn't fail immediately but it causes NPE later. Ozone fails right away with that bug. Took us a few days to find out.
(I need to check but I think directory rename is fine for Ozone in this case)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As recoverLease doesn't support the directory path (https://github.com/apache/hadoop/blob/fb1bb6429dfb4e45687e0bc507c5a2ed26bd0bb0/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/leaserecoverable.md), so we need to recoverlease for each file. We also have to recoverLease before renaming the walFile.
Btw at least for hadoop I think that both (recoverLease after rename or before rename) are fine. As renaming is a metadata operation and data is linked to INodes.