Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rebalancing an upsert table causing high GC and failure to reconnect to ZK #14301

Open
dang-stripe opened this issue Oct 24, 2024 · 2 comments
Open

Comments

@dang-stripe
Copy link
Contributor

Follow up from apache/helix#2951 which provides more detail.

We performed a rebalance on an upsert table using low-disk mode that led to high GC on a server and the server constantly trying to reconnect to ZK. The server never recovers until we manually restart it.

@Jackie-Jiang had a theory this might be tied to the metadata manager for old partitions not getting released even after the segments were all dropped and thus there's a large empty concurrent hash map still on heap causing GC.

@Jackie-Jiang
Copy link
Contributor

cc @klsince @tibrewalpratik17

@tibrewalpratik17
Copy link
Contributor

#11626 also might be related where we have seen similar behaviour after long GC pause and helix-pending-messages metric spikes up and doesn't recover.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants