Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When the network is abnormally disconnected, the connection will not be restored, and the access will generate a timeout error #1782

Closed
Adam-Jin opened this issue Jun 26, 2021 · 6 comments

Comments

@Adam-Jin
Copy link

I have some microservices hosted in a three-node docker swarm cluster, and they rely on a single-node redis instance under the same custom network.

Under normal circumstances, everything is fine. When I perform a swarm cluster test, shut down the docker daemon of the node where the redis instance is located or simply shut down that machine, a new redis instance will be created on other machines quickly, and the service will access this new redis instance through the domain name. It's no problem.

But when I change a test method and use "systemctl stop network" to directly shut down the network service of the node where the redis instance is located, even if a new redis is created on another machine, the service will never be connected to redis. A timeout error occurred. under these circumstances. I tried to use the redis client cli to connect to redis in the service container. Everything is normal again. I can set or get the key value correctly. This proves that the redis connection timeout is not caused by the swarm network.

At the same time, I tried to set keepAlive in the connection string and upgrade niget to the latest version, but these attempts did not work.

I have also noticed this problem. This one should have encountered a similar problem with me.
Do you have any suggestions?

Redis: 6.0.5
StackExchange.Redis: 2.2.50
RedisConnectionString: console-net_redis:6379,password=123456,ssl=False,abortConnect=False,keepAlive=30
Exception:

[[bis]] [2021-06-26 06:28:55.409] [err] [production] [apigateway] [491241383e034d3ab3ea70a5fc724e4d] Connection id "0HM9ODBMBJ9PU", Request id "0HM9ODBMBJ9PU:00000014": An unhandled exception was thrown by the application. StackExchange.Redis.RedisTimeoutException: Timeout performing GET (5000ms), next: PING, inst: 0, qu: 0, qs: 19, aw: False, rs: ReadAsync, ws: Idle, in: 0, in-pipe: 0, out-pipe: 0, serverEndpoint: console-net_redis:6379, mc: 1/1/0, mgr: 10 of 10 available, clientName: 2c3f5c0383c1, IOCP: (Busy=0,Free=1000,Min=200,Max=1000), WORKER: (Busy=1,Free=32766,Min=200,Max=32767), v: 2.1.30.38891 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)
at StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[T](Message message, ResultProcessor1 processor, ServerEndPoint server) in /_/src/StackExchange.Redis/ConnectionMultiplexer.cs:line 2624 at StackExchange.Redis.RedisBase.ExecuteSync[T](Message message, ResultProcessor1 processor, ServerEndPoint server) in //src/StackExchange.Redis/RedisBase.cs:line 54
at StackExchange.Redis.RedisDatabase.StringGet(RedisKey key, CommandFlags flags) in /
/src/StackExchange.Redis/RedisDatabase.cs:line 2374
at Encoo.Console.ClusterServices.ApiGateway.Caching.CacheManager1.Get(String key, String region) in /src/ClusterServices/Gateways/ApiGateway/Caching/CacheManager.cs:line 59 at Encoo.Console.ClusterServices.ApiGateway.IdentityUserProvider.GetUserAsync(HttpContext context, CancellationToken token) in /src/ClusterServices/Gateways/ApiGateway/Services/IdentityUserProvider.cs:line 35 at Encoo.Console.ClusterServices.ApiGateway.AuthorizeHeadersMiddleware.InvokeAsync(HttpContext context) in /src/ClusterServices/Gateways/ApiGateway/Middleware/AuthorizeHeadersMiddleware.cs:line 29 at Microsoft.AspNetCore.MiddlewareAnalysis.AnalysisMiddleware.Invoke(HttpContext httpContext) at Microsoft.AspNetCore.Authentication.AuthenticationMiddleware.Invoke(HttpContext context) at Microsoft.AspNetCore.MiddlewareAnalysis.AnalysisMiddleware.Invoke(HttpContext httpContext) at Microsoft.AspNetCore.MiddlewareAnalysis.AnalysisMiddleware.Invoke(HttpContext httpContext) at Microsoft.AspNetCore.MiddlewareAnalysis.AnalysisMiddleware.Invoke(HttpContext httpContext) at Encoo.Console.Framework.Middleware.TraceMiddleware.InvokeAsync(HttpContext context) in /src/Common/Framework/ServiceCore/Middleware/TraceMiddleware.cs:line 22 at Microsoft.AspNetCore.MiddlewareAnalysis.AnalysisMiddleware.Invoke(HttpContext httpContext) at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol.ProcessRequests[TContext](IHttpApplication1 application)

@NickCraver
Copy link
Collaborator

Can you please try the 2.2.50 release? I see in your report version 2.2.50, but in the error message it shows the running version is 2.1.30. We have several connection fixes for exactly this sort of situation to reconnect faster and more reliably in 2.2.50.

@Adam-Jin
Copy link
Author

Sorry, I actually tried the latest version of NuGet package and it still doesn't work. The logs will be posted later

@Adam-Jin
Copy link
Author

[[bis]] [2021-06-28 02:54:12.038] [err] [production] [apigateway] [314e94a273614bc8a0b899a6c7361e64] Connection id "0HM9PRUQIPGJN", Request id "0HM9PRUQIPGJN:0000000D": An unhandled exception was thrown by the application. StackExchange.Redis.RedisTimeoutException: Timeout performing GET (5000ms), next: PING, inst: 0, qu: 0, qs: 3, aw: False, rs: ReadAsync, ws: Idle, in: 0, in-pipe: 0, out-pipe: 0, serverEndpoint: console-net_redis:6379, mc: 1/1/0, mgr: 10 of 10 available, clientName: c229e37f06c7, IOCP: (Busy=0,Free=1000,Min=200,Max=1000), WORKER: (Busy=1,Free=32766,Min=200,Max=32767), v: 2.2.50.36290 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)
at StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[T](Message message, ResultProcessor1 processor, ServerEndPoint server) in /_/src/StackExchange.Redis/ConnectionMultiplexer.cs:line 2848 at StackExchange.Redis.RedisBase.ExecuteSync[T](Message message, ResultProcessor1 processor, ServerEndPoint server) in //src/StackExchange.Redis/RedisBase.cs:line 54
at StackExchange.Redis.RedisDatabase.StringGet(RedisKey key, CommandFlags flags) in /
/src/StackExchange.Redis/RedisDatabase.cs:line 2409
at Encoo.Console.ClusterServices.ApiGateway.Caching.CacheManager1.Get(String key, String region) in /src/ClusterServices/Gateways/ApiGateway/Caching/CacheManager.cs:line 55 at Encoo.Console.ClusterServices.ApiGateway.AuthorizeMiddleware.InvokeAsync(HttpContext context) in /src/ClusterServices/Gateways/ApiGateway/Middleware/AuthorizeMiddleware.cs:line 72 at Microsoft.AspNetCore.MiddlewareAnalysis.AnalysisMiddleware.Invoke(HttpContext httpContext) at Encoo.Console.ClusterServices.ApiGateway.AuthorizeHeadersMiddleware.InvokeAsync(HttpContext context) in /src/ClusterServices/Gateways/ApiGateway/Middleware/AuthorizeHeadersMiddleware.cs:line 97 at Microsoft.AspNetCore.MiddlewareAnalysis.AnalysisMiddleware.Invoke(HttpContext httpContext) at Microsoft.AspNetCore.Authentication.AuthenticationMiddleware.Invoke(HttpContext context) at Microsoft.AspNetCore.MiddlewareAnalysis.AnalysisMiddleware.Invoke(HttpContext httpContext) at Microsoft.AspNetCore.MiddlewareAnalysis.AnalysisMiddleware.Invoke(HttpContext httpContext) at Microsoft.AspNetCore.MiddlewareAnalysis.AnalysisMiddleware.Invoke(HttpContext httpContext) at Encoo.Console.Framework.Middleware.TraceMiddleware.InvokeAsync(HttpContext context) in /src/Common/Framework/ServiceCore/Middleware/TraceMiddleware.cs:line 21 at Microsoft.AspNetCore.MiddlewareAnalysis.AnalysisMiddleware.Invoke(HttpContext httpContext) at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol.ProcessRequests[TContext](IHttpApplication1 application)

@SimonPapworth
Copy link

We had to code around this because some disconnects cause the client to never connect again, we had a clever man make some changes to our code and it now forces a new connection to be created after a number of a particular client exceptions within a short time.

Of course that only works when there is a good amount of traffic 24/7.

@NickCraver
Copy link
Collaborator

Forgot to follow this up - please see #1848 for the platform-level issue and how to resolve if you encounter this!

@philon-msft
Copy link
Collaborator

Update: a new version 2.7.10 has been released, including #2610 to detect and recover stalled sockets. This should help prevent the situation where connections can stall for ~15 minutes on Linux clients.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants