Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High connection count on Redis Cluster shard 0001 with redis engine 6.2+ #3478

Open
mikecoder5 opened this issue Jan 11, 2025 · 0 comments
Open

Comments

@mikecoder5
Copy link

mikecoder5 commented Jan 11, 2025

Version: redis-py 5.2.1 - redis engine 6.2.6

Platform: Python 3.9 on Debian 12 / AWS

Description: Upgrading to latest redis-py from redis-py-cluster causes a large imbalanced 10x increase in connection count on only shard 1 node 1. The RedisCluster client is created as follows RedisCluster(host=aws_configuration_endpoint) using a configuration endpoint which redirects to a "random" redis node. This connection count problem happens with redis engine 6.2.6 but not 5.0.6

Suspected Core Issues:

  • Connection count: There may be unnecessary connections being opened (2-3x increase in new connection count from the old redis-py-cluster library)
  • First node in cluster slots bias: We know the initial redis cluster command is issued to an effectively random redis node because of how the configuration endpoint works. This means there are additional cluster commands issued during RedisCluster client initialization (or somehow unnecessary multiple connections being opened) to the first node returned in the cluster slots list (which for engine 6.2+ is node 0001). Assuming additional redis calls are needed for client initialization, then ideally we would reuse the existing node we made the initial cluster slots call against or select a random one.

Observations:

  • Elevated connections only on node 0001 (other primary nodes on other shards are normal)
  • redis-py get_default_node() behavior
    • redis engine 6.2.6 - always returns the same node 0001
    • redis engine 5.0.6 - returns a seemingly random node
  • redis-cli cluster slots ordering of list of slots and nodes
    • redis engine 5.0.6 uses "random" ordering
      • More specifically ordering is stable for a given node (calling the same node multiple times results in the getting back the same list ordered the same way), but each node has it's own seemingly random ordering. So as long as the cluster slots command is issued to a random node each time (which is the case when using the configuration endpoint), then effectively the response appears to be a random list
    • redis engine 6.2.6 uses sorted ordering
      • Regardless of which node is called, the first slot in cluster slots is always slot 0 and as a result it's always node 0001
  • redis-py always sets the default node to the first node returned by cluster slots (code link)
    • This is easy to update however calling replace_default_node() post-init of client does not fix the connection count issue. This probably means the root cause is during initialization where the client issues additional commands to the default node (unsure if it's before or after self.default_node is set)

Sample Code: This does not reproduce the high connection count, but it does show the cluster slots sorting behavior and the client bias for the first node in the list.

from redis import RedisCluster
r5_nodes = []
r6_nodes = []
redis_5_0_host = "" # fill in with configuration endpoint for cluster running redis 5.0.6
redis_6_2_host = "" # fill in with configuration endpoint for cluster running redis 6.2.6
for i in range(20):
    r5_client = RedisCluster(host=redis_5_0_host, port=6379)
    r6_client = RedisCluster(host=redis_6_2_host, port=6379)
    r5_nodes.append(r5_client.get_default_node().host)
    r6_nodes.append(r6_client.get_default_node().host)

set(r5_nodes) # Prints most/all of the primary nodes in the cluster
set(r6_nodes) # Prints only one node
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant