Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Event loop occasionally hangs after redisClusterAsyncDisconnect under high connection error conditions #219

Closed
plainbanana opened this issue May 18, 2024 · 2 comments

Comments

@plainbanana
Copy link
Contributor

plainbanana commented May 18, 2024

Hello hiredis-cluster team,

We have encountered a situation where the event loop hangs after issuing redisClusterAsyncDisconnect in conditions where connection errors are likely to occur.

Here is a reproducible example on the current master branch. The program hangs only when .heavyoperation is set to true. Sorry for the rough example.
https://gist.github.com/plainbanana/0f287c6afec4527dba566d01e98aca63

I may be wrong, but I have looked into possible causes:

  1. A debug sleep 1 added in redisClusterAsyncCommandToNode timed out.
  2. This timeout triggered updateSlotMapAsync, adding a CLUSTER NODES command.
  3. A heavy operation (e.g. sleep 1) was executed in the commandCallback.
  4. redisClusterAsyncDisconnect was executed in the commandCallback.
  5. Due to the timeout of the CLUSTER NODES command (?), a retry was attempted in clusterNodesReplyCallback, which established a new connection and added another CLUSTER NODES command.
  6. The hiredis redisProcessCallbacks did not recognize REDIS_DISCONNECTING, leading to a hang.

Is there a way to avoid the hang and successfully terminate the connections in such a scenario? In my humble opinion, it might be better not to initiate a new connection in (5) if redisClusterAsyncDisconnect has already been issued. I would appreciate your input.

Thank you.

@bjosv
Copy link
Collaborator

bjosv commented May 20, 2024

In my humble opinion, it might be better not to initiate a new connection in (5) if redisClusterAsyncDisconnect has already been issued.

Yes, it sounds reasonable that hiredis-cluster shouldn't initiate new connections when a user wants to disconnect.
When disconnecting the flag REDIS_DISCONNECTING is set in hiredis which blocks new commands on existing nodes.
But we should probably add a similar flag in hiredis-cluster to avoid creating new connections to nodes.

@bjosv
Copy link
Collaborator

bjosv commented Aug 15, 2024

Closed by #225

@bjosv bjosv closed this as completed Aug 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants