-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Issue] Cross cluster replication is too slow between Opensearch clusters deployed on two separate kubernetes clusters. #1385
Comments
We deleted the follower index again and it got recreated but took several hours to replicate 14GB of index data to the secondary site. Now the status of the replication status is showing as below;
|
Can someone please acknowledge and help on this issue. |
There isn't a easy way to figure out why performance is slow here. My suspicion is that either:
|
We were able to deduce the source of issue to the number of socket connections to the remote cluster.
Here we could see num_proxy_sockets_connected has reached the value of num_proxy_sockets_connected which is 18. We could the replication tasks also stuck at this number;
We increased the proxy_socket_connections to 100 first and we could all the pending indexes got immediately replicated. (NOTE: cluster settings API didn't accept max_proxy_socket_connections field but accpeted max_proxy_socket_connections)
We coould see new replicationt asks getting created and successful replicaiton of the pending indices.
However, now when check the remote/info API, the num_proxy_sockets_connections and max_proxy_socket_connections are showing as same number.
Now, when I try to create a new index on the Leader, it is not getting replicated until I increase the proxy_socket_connections to a higher value, thereby allowing the follower to create new replication connections. Is there any way we can set just the |
Describe the bug
Environment information:
VERSON: opensearch:2.9.0-release-4.14.0-29.12.2023
We have two opensearch clusters deployed on two different Kubernetes clusters in Azure.
We have enabled cross cluster replication between these two opensearch clusters deployed on two seaprate Kuberenetes clusters.
Whenever we push some documents into indices to the leader cluster and it is taking much longer time to replica.
We have tried with indices which are in KiloBytes as well as larger indices which are in GigaBytes.
In both cases, there is a definite lag we could observe from the time of pushing the documents in the leader site and the same documents being available in the follower site.
We had pushed an index 'projecttask' on the leader site. These are stats from leader site.
When we kept monitoring the remote site, the replication status on follower site was showing as BOOTSTRAPPING for a long time. Now status is showing as SYNCING but the time duration is taking a long time.
Posted output of replication status API at several times yesterday to monitor the status. If you can see the bytes_percent and files_percent, it is progressing very gradually. Seeking help on how to troubleshoot this further and to speed up the replication.
curl -XGET -k -u ':' 'http://localhost:9200/_plugins/_replication/projecttask/_status?pretty'
Related component
Plugins
To Reproduce
Expected behavior
The time taken to replicate the leader index to the follower site should not take hours.
Additional Details
Plugins
Please list all plugins currently enabled.
Screenshots
If applicable, add screenshots to help explain your problem.
Host/Environment (please complete the following information):
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: