Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HBASE-28951 rename the wal before retrying the wal-split with another worker #6534

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -723,6 +723,7 @@ enum SplitWALState{
ACQUIRE_SPLIT_WAL_WORKER = 1;
DISPATCH_WAL_TO_WORKER = 2;
RELEASE_SPLIT_WORKER = 3;
RETRY_ON_DIFFERENT_WORKER = 4;
}

message ClaimReplicationQueuesStateData {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -184,4 +184,30 @@ public void releaseSplitWALWorker(ServerName worker, MasterProcedureScheduler sc
public void addUsedSplitWALWorker(ServerName worker) {
splitWorkerAssigner.addUsedWorker(worker);
}

/**
* Rename the WAL file at the specified walPath to retry with another worker. Returns true if the
* file is successfully renamed, or if it has already been renamed in previous try. Returns false
* if neither of the files exists. It throws an IOException if got any error while renaming. This
* method is only called in case of failure on one worker so in case of no failure flow is same as
* old one.
*/
public boolean ifExistRenameWALForRetry(String walPath, String postRenameWalPath)
throws IOException {
if (fs.exists(new Path(rootDir, walPath))) {
if (!fs.rename(new Path(rootDir, walPath), new Path(rootDir, postRenameWalPath))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I'm not terribly familiar with wal split.
Does the WAL file get closed by this time? I'm asking because Ozone doesn't yet support renaming open files. And supporting that is quite a big project itself.

Even thought that's not yet a huge problem for HBase since HBase isn't default to run on Ozone, it would be great if we don't attempt to rename open files.

Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this time there are cases when the WALFile will still be open. In current code we recoverLease at RS once master assign the splitting to the worker RS. @Apache9 do you think we should move the recoverLease to Master ?

Before this rename we also rename the WALdirectory for the rs. @jojochuang is renaming directory is different from file renaming ? If directory rename is also not supported when some file inside the directory is open then we need changes in current code as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should have been done before we rename the wal directory?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks guys.

I think what's more relevant for HBase is that it used to cause race conditions if the WAL files are kept open while being renamed. HBASE-27732 fixed one such bug -- because HDFS allows renaming open files, it doesn't fail immediately but it causes NPE later. Ozone fails right away with that bug. Took us a few days to find out.

(I need to check but I think directory rename is fine for Ozone in this case)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As recoverLease doesn't support the directory path (https://github.com/apache/hadoop/blob/fb1bb6429dfb4e45687e0bc507c5a2ed26bd0bb0/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/leaserecoverable.md), so we need to recoverlease for each file. We also have to recoverLease before renaming the walFile.

Btw at least for hadoop I think that both (recoverLease after rename or before rename) are fine. As renaming is a metadata operation and data is linked to INodes.

throw new IOException("Failed to rename wal " + walPath + " to " + postRenameWalPath);
}
return true;
} else {
if (fs.exists(new Path(rootDir, postRenameWalPath))) {
LOG.info(
"{} was already renamed in last retry to {}, will continue to acquire new worker to split wal",
walPath, postRenameWalPath);
return true;
} else {
return false;
}
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@
*/
package org.apache.hadoop.hbase.master.procedure;

import static org.apache.hadoop.hbase.wal.AbstractFSWALProvider.RETRYING_EXT;

import java.io.IOException;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.ServerName;
Expand Down Expand Up @@ -79,30 +81,82 @@ protected Flow executeFromState(MasterProcedureEnv env, MasterProcedureProtos.Sp
try {
finished = splitWALManager.isSplitWALFinished(walPath);
} catch (IOException ioe) {
if (retryCounter == null) {
retryCounter = ProcedureUtil.createRetryCounter(env.getMasterConfiguration());
}
long backoff = retryCounter.getBackoffTimeAndIncrementAttempts();
LOG.warn("Failed to check whether splitting wal {} success, wait {} seconds to retry",
walPath, backoff / 1000, ioe);
setTimeout(Math.toIntExact(backoff));
setState(ProcedureProtos.ProcedureState.WAITING_TIMEOUT);
skipPersistence();
LOG.warn("Failed to check whether splitting wal {} success", walPath);
setTimeoutForSuspend(env, ioe.getMessage());
throw new ProcedureSuspendedException();
}
splitWALManager.releaseSplitWALWorker(worker, env.getProcedureScheduler());
if (!finished) {
LOG.warn("Failed to split wal {} by server {}, retry...", walPath, worker);
setNextState(MasterProcedureProtos.SplitWALState.ACQUIRE_SPLIT_WAL_WORKER);
setNextState(MasterProcedureProtos.SplitWALState.RETRY_ON_DIFFERENT_WORKER);
return Flow.HAS_MORE_STATE;
}
ServerCrashProcedure.updateProgress(env, getParentProcId());
return Flow.NO_MORE_STATE;
case RETRY_ON_DIFFERENT_WORKER:
// HBASE-28951: In some cases, a RegionServer (RS) becomes unresponsive, which leads the
// master to mistakenly assume the RS has crashed and mark the SplitWALRemoteProcedure as
// failed. As a result, the WAL splitting task is reassigned to another worker. However, if
// the original worker starts responding again, both RegionServers may attempt to process
// the same WAL at the same time, causing both operations to fail. To prevent this conflict,
// we can "fence" the WAL by renaming it.
try {
String postRenameWalPath = getPostRenameWalPath();
if (splitWALManager.ifExistRenameWALForRetry(this.walPath, postRenameWalPath)) {
this.walPath = postRenameWalPath;
setNextState(MasterProcedureProtos.SplitWALState.ACQUIRE_SPLIT_WAL_WORKER);
return Flow.HAS_MORE_STATE;
} else {
// The method ifExistRenameWALForRetry will return false if the WAL file at walPath does
// not exist and has not been renamed to postRenameWalPath. This can only happen if the
// WALSplit was already completed before the flow reached this point..
ServerCrashProcedure.updateProgress(env, getParentProcId());
return Flow.NO_MORE_STATE;
}
} catch (IOException ioe) {
LOG.warn("Failed to rename the splitting wal {}", walPath);
setTimeoutForSuspend(env, ioe.getMessage());
throw new ProcedureSuspendedException();
}

default:
throw new UnsupportedOperationException("unhandled state=" + state);
}
}

private String getPostRenameWalPath() {
// If the WAL split is retried with a different worker, ".retrying-xxx" is appended to the
// walPath. We cannot maintain a workerChangeCount in the SplitWALProcedure class due to the
// following scenario:
// If a SplitWALProcedure is bypassed or rolled back after being retried with another worker,
// the walPath will still have the ".retrying-xxx" suffix. During recovery, a new
// SplitWALProcedure (potentially by a different SCP) will be created. This new procedure will
// have its workerChangeCount initialized to 0, but the walPath will retain the ".retrying-xxx"
// suffix from the previous retry. To handle all these scenarios, we need to move to the next
// count(retrying-xxy) in order to properly fence the last worker.
String originalWALPath, postRenameWalPath;
int workerChangeCount = 0;
if (walPath.substring(walPath.length() - RETRYING_EXT.length() - 3).startsWith(RETRYING_EXT)) {
originalWALPath = walPath.substring(0, walPath.length() - RETRYING_EXT.length() - 3);
workerChangeCount = Integer.parseInt(walPath.substring(walPath.length() - 3));
} else {
originalWALPath = walPath;
}
return originalWALPath + RETRYING_EXT + String.format("%03d", (workerChangeCount + 1) % 1000);

}

private void setTimeoutForSuspend(MasterProcedureEnv env, String reason) {
if (retryCounter == null) {
retryCounter = ProcedureUtil.createRetryCounter(env.getMasterConfiguration());
}
long backoff = retryCounter.getBackoffTimeAndIncrementAttempts();
LOG.warn("{} can not run currently because {}, wait {} ms to retry", this, reason, backoff);
setTimeout(Math.toIntExact(backoff));
setState(ProcedureProtos.ProcedureState.WAITING_TIMEOUT);
skipPersistence();
}

@Override
protected void rollbackState(MasterProcedureEnv env,
MasterProcedureProtos.SplitWALState splitOneWalState) throws IOException, InterruptedException {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -234,9 +234,16 @@ static void requestLogRoll(final WAL wal) {
static final String DEFAULT_PROVIDER_ID = "default";

// Implementation details that currently leak in tests or elsewhere follow
/** File Extension used while splitting an WAL into regions (HBASE-2312) */
/**
* File Extension used while splitting an WAL into regions (HBASE-2312) This is used with the
* directory name/path
*/
public static final String SPLITTING_EXT = "-splitting";

// Extension for the WAL where the split failed on one worker and is being retried on another.
// this is used with the WAL file itself
public static final String RETRYING_EXT = ".retrying";
Umeshkumar9414 marked this conversation as resolved.
Show resolved Hide resolved

/**
* Pattern used to validate a WAL file name see {@link #validateWALFilename(String)} for
* description.
Copy link
Contributor Author

@Umeshkumar9414 Umeshkumar9414 Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while splitting the wal for meta table. wal name can be rs.XXX.meta.retrying001. Do you think we should update the WAL_FILE_NAME_PATTERN. Althought in splitting we didn't check for valid wal name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Apache9 what do you think?

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,10 +19,13 @@

import static org.apache.hadoop.hbase.HConstants.HBASE_SPLIT_WAL_COORDINATED_BY_ZK;
import static org.apache.hadoop.hbase.HConstants.HBASE_SPLIT_WAL_MAX_SPLITTER;
import static org.apache.hadoop.hbase.master.MasterWalManager.NON_META_FILTER;
import static org.apache.hadoop.hbase.master.procedure.ServerProcedureInterface.ServerOperationType.SPLIT_WAL;
import static org.apache.hadoop.hbase.wal.AbstractFSWALProvider.RETRYING_EXT;

import java.io.IOException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.concurrent.CountDownLatch;
import org.apache.hadoop.fs.FileStatus;
Expand All @@ -42,6 +45,7 @@
import org.apache.hadoop.hbase.procedure2.ProcedureTestingUtility;
import org.apache.hadoop.hbase.procedure2.ProcedureYieldException;
import org.apache.hadoop.hbase.procedure2.StateMachineProcedure;
import org.apache.hadoop.hbase.regionserver.HRegionServer;
import org.apache.hadoop.hbase.testclassification.LargeTests;
import org.apache.hadoop.hbase.testclassification.MasterTests;
import org.apache.hadoop.hbase.util.Bytes;
Expand Down Expand Up @@ -94,6 +98,37 @@ public void teardown() throws Exception {
TEST_UTIL.shutdownMiniCluster();
}

@Test
public void testRenameWALForRetryWALSplit() throws Exception {
HRegionServer regionServer = TEST_UTIL.getHBaseCluster().getRegionServer(0);
List<Path> logDirs =
master.getMasterWalManager().getLogDirs(Collections.singleton(regionServer.getServerName()));
List<FileStatus> list =
SplitLogManager.getFileList(TEST_UTIL.getConfiguration(), logDirs, NON_META_FILTER);
Path testWal = list.get(0).getPath();

String newWALPath = testWal + RETRYING_EXT + "001";
boolean result = splitWALManager.ifExistRenameWALForRetry(testWal.toString(), newWALPath);
Assert.assertTrue(result);

List<FileStatus> fileStatusesA =
SplitLogManager.getFileList(TEST_UTIL.getConfiguration(), logDirs, NON_META_FILTER);
Path walFromFileSystem = null;
for (FileStatus wal : fileStatusesA) {
if (wal.getPath().toString().endsWith(RETRYING_EXT + "001")) {
walFromFileSystem = wal.getPath();
break;
}
}
Assert.assertNotNull(walFromFileSystem);
Assert.assertEquals(walFromFileSystem.toString(), newWALPath);

// if file is already renamed it should return true
boolean resultAfterRename =
splitWALManager.ifExistRenameWALForRetry(testWal.toString(), newWALPath);
Assert.assertTrue(resultAfterRename);
}

@Test
public void testAcquireAndRelease() throws Exception {
List<FakeServerProcedure> testProcedures = new ArrayList<>();
Expand Down Expand Up @@ -166,7 +201,7 @@ public void testCreateSplitWALProcedures() throws Exception {
Assert.assertFalse(TEST_UTIL.getTestFileSystem().exists(wals[0].getPath()));

// Test splitting wal
wals = TEST_UTIL.getTestFileSystem().listStatus(metaWALDir, MasterWalManager.NON_META_FILTER);
wals = TEST_UTIL.getTestFileSystem().listStatus(metaWALDir, NON_META_FILTER);
Assert.assertEquals(1, wals.length);
testProcedures =
splitWALManager.createSplitWALProcedures(Lists.newArrayList(wals[0]), metaServer);
Expand Down