Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HBASE-29068: Report percentFilesLocalPrimaryRegions metric #6596

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -314,10 +314,16 @@ public interface MetricsRegionServerSource extends BaseSource, JvmPauseMonitorSo
"Size of data that has been sent by clients with the write ahead logging turned off.";
String PERCENT_FILES_LOCAL = "percentFilesLocal";
String PERCENT_FILES_LOCAL_DESC =
"The percent of HFiles that are stored on the local hdfs data node.";
"The percent of HFiles that are stored on the local hdfs data node. If not using "
+ "region replicas, this should equal percentFilesLocalPrimaryRegions";
String PERCENT_FILES_LOCAL_PRIMARY_REGIONS = "percentFilesLocalPrimaryRegions";
String PERCENT_FILES_LOCAL_PRIMARY_REGIONS_DESC =
"The percent of HFiles used by primary regions that are stored on the local hdfs data node. "
+ "This is the category of locality that you want to reach 100% when using region replicas";
String PERCENT_FILES_LOCAL_SECONDARY_REGIONS = "percentFilesLocalSecondaryRegions";
String PERCENT_FILES_LOCAL_SECONDARY_REGIONS_DESC =
"The percent of HFiles used by secondary regions that are stored on the local hdfs data node.";
"The percent of HFiles used by secondary regions that are stored on the local hdfs data node. "
+ "This is not likely to reach 100%";
Comment on lines -320 to +326
Copy link
Contributor

@rmdmattingly rmdmattingly Jan 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed it's unlikely, but wouldn't it be possible to achieve 100% locality for both primary and secondary replicas if your number of replicas is less than or equal to your hdfs replication factor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theoretically that's possible. I say it's unlikely because something would need to ensure that all the block replicas for a secondary region end up co-located with the region, and nothing I'm aware of would do that. Compactions won't help. Internally at hubspot we have the locality healer, but today even that doesn't know about secondary regions, although I plan on that eventually.

Copy link
Contributor

@rmdmattingly rmdmattingly Jan 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The balancer is aware of locality — I wonder to what extent LocalityBasedCandidateGenerator could maintain secondary replica locality. I haven't looked at the implementation with replicas in mind, but it might already balance secondary replicas based on locality to some extent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. I would still argue that that isn't likely to be very helpful. The region balancer needs to put a region on exactly one server, but in the absence of some helpful force, it's unlikely that all the blocks for any given region exist on a single DataNode. The region balancer can only do as well as the random block distribution allows. Before we had the locality healer at hubspot, we experienced this difficultly all the time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good point

String SPLIT_QUEUE_LENGTH = "splitQueueLength";
String SPLIT_QUEUE_LENGTH_DESC = "Length of the queue for splits.";
String COMPACTION_QUEUE_LENGTH = "compactionQueueLength";
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -557,6 +557,9 @@ private MetricsRecordBuilder addGaugesToMetricsRecordBuilder(MetricsRecordBuilde
rsWrap.getDataInMemoryWithoutWAL())
.addGauge(Interns.info(PERCENT_FILES_LOCAL, PERCENT_FILES_LOCAL_DESC),
rsWrap.getPercentFileLocal())
.addGauge(
Interns.info(PERCENT_FILES_LOCAL_PRIMARY_REGIONS, PERCENT_FILES_LOCAL_PRIMARY_REGIONS_DESC),
rsWrap.getPercentFileLocalPrimaryRegions())
.addGauge(Interns.info(PERCENT_FILES_LOCAL_SECONDARY_REGIONS,
PERCENT_FILES_LOCAL_SECONDARY_REGIONS_DESC), rsWrap.getPercentFileLocalSecondaryRegions())
.addGauge(Interns.info(TOTAL_BYTES_READ, TOTAL_BYTES_READ_DESC), rsWrap.getTotalBytesRead())
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -229,6 +229,11 @@ public interface MetricsRegionServerWrapper {
*/
double getPercentFileLocal();

/**
* Get the percent of HFiles' that are local for primary region replicas.
*/
double getPercentFileLocalPrimaryRegions();

/**
* Get the percent of HFiles' that are local for secondary region replicas.
*/
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -612,6 +612,11 @@ public double getPercentFileLocal() {
return aggregate.percentFileLocal;
}

@Override
public double getPercentFileLocalPrimaryRegions() {
return aggregate.percentFileLocalPrimaryRegions;
}

@Override
public double getPercentFileLocalSecondaryRegions() {
return aggregate.percentFileLocalSecondaryRegions;
Expand Down Expand Up @@ -762,6 +767,7 @@ private static final class RegionMetricAggregate {
private long numMutationsWithoutWAL = 0;
private long dataInMemoryWithoutWAL = 0;
private double percentFileLocal = 0;
private double percentFileLocalPrimaryRegions = 0;
private double percentFileLocalSecondaryRegions = 0;
private long flushedCellsCount = 0;
private long compactedCellsCount = 0;
Expand Down Expand Up @@ -794,6 +800,7 @@ private RegionMetricAggregate(RegionMetricAggregate other) {
private void aggregate(HRegionServer regionServer,
Map<String, ArrayList<Long>> requestsCountCache) {
HDFSBlocksDistribution hdfsBlocksDistribution = new HDFSBlocksDistribution();
HDFSBlocksDistribution hdfsBlocksDistributionPrimaryRegions = new HDFSBlocksDistribution();
HDFSBlocksDistribution hdfsBlocksDistributionSecondaryRegions = new HDFSBlocksDistribution();

long avgAgeNumerator = 0;
Expand Down Expand Up @@ -821,6 +828,9 @@ private void aggregate(HRegionServer regionServer,

HDFSBlocksDistribution distro = r.getHDFSBlocksDistribution();
hdfsBlocksDistribution.add(distro);
if (r.getRegionInfo().getReplicaId() == RegionInfo.DEFAULT_REPLICA_ID) {
hdfsBlocksDistributionPrimaryRegions.add(distro);
}
if (r.getRegionInfo().getReplicaId() != RegionInfo.DEFAULT_REPLICA_ID) {
hdfsBlocksDistributionSecondaryRegions.add(distro);
}
Expand All @@ -832,6 +842,11 @@ private void aggregate(HRegionServer regionServer,
hdfsBlocksDistribution.getBlockLocalityIndex(regionServer.getServerName().getHostname());
percentFileLocal = Double.isNaN(localityIndex) ? 0 : (localityIndex * 100);

float localityIndexPrimaryRegions = hdfsBlocksDistributionPrimaryRegions
.getBlockLocalityIndex(regionServer.getServerName().getHostname());
percentFileLocalPrimaryRegions =
Double.isNaN(localityIndexPrimaryRegions) ? 0 : (localityIndexPrimaryRegions * 100);

float localityIndexSecondaryRegions = hdfsBlocksDistributionSecondaryRegions
.getBlockLocalityIndex(regionServer.getServerName().getHostname());
percentFileLocalSecondaryRegions =
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -262,6 +262,11 @@ public double getPercentFileLocal() {
return 99;
}

@Override
public double getPercentFileLocalPrimaryRegions() {
return 99;
}

@Override
public double getPercentFileLocalSecondaryRegions() {
return 99;
Expand Down