Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconnect issue after downtime - "Resource temporarily unavailable" #2286

Closed
alandoherty opened this issue Oct 1, 2023 · 3 comments · Fixed by #2287
Closed

Reconnect issue after downtime - "Resource temporarily unavailable" #2286

alandoherty opened this issue Oct 1, 2023 · 3 comments · Fixed by #2287
Labels
bug Something isn't working

Comments

@alandoherty
Copy link

alandoherty commented Oct 1, 2023

What version of gRPC and what language are you using?

Grpc.Net.Client 2.56.0

What operating system (Linux, Windows,...) and version?

Debian 11
Linux --redacted-- 5.10.0-10-amd64 #1 SMP Debian 5.10.84-1 (2021-12-08) x86_64 GNU/Linux

What runtime / compiler are you using (e.g. .NET Core SDK version dotnet --info)

.NET 7, published ready to run - so not sure how to obtain this information.

What did you do?

We recently had a significant period of downtime where our networking was being migrated for 2 hours, and found that the GrpcChannel wasn't automatically reconnecting after the downtime ended.

What did you expect to see?

Automatic reconnect.

What did you see instead?

We were seeing a strange error:

Grpc.Core.RpcException: Status(StatusCode="Unavailable", Detail="Error connecting to subchannel.", DebugException="System.Net.Sockets.SocketException: Resource temporarily unavailable")
---> System.Net.Sockets.SocketException (11): Resource temporarily unavailable
  at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken)
  at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token)
  at System.Net.Sockets.Socket.<ConnectAsync>g__WaitForConnectWithCancellation|281_0(AwaitableSocketAsyncEventArgs saea, ValueTask connectTask, CancellationToken cancellationToken)
  at Grpc.Net.Client.Balancer.Internal.SocketConnectivitySubchannelTransport.TryConnectAsync(ConnectContext context)
--- End of inner exception stack trace ---
  at Grpc.Net.Client.Balancer.Internal.ConnectionManager.PickAsync(PickContext context, Boolean waitForReady, CancellationToken cancellationToken)
  at Grpc.Net.Client.Balancer.Internal.BalancerHttpHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
  at Grpc.Net.Client.Internal.GrpcCall`2.RunCall(HttpRequestMessage request, Nullable`1 timeout)
  at Grpc.Net.Client.Internal.Retry.RetryCallBase`2.GetResponseAsync()
  -- REDACTED --

There aren't any issues with the network that I can see. I restarted the .NET application on one of the boxes running this code encountering this issue, and the issue was resolved.

This channel is being used very frequently, and throughout the downtime (and continuing after) would no doubt have created many calls to TryConnectAsync.

My best guess is that this is some issue with creating too many Socket instances that are cleaned up too slowly? Looking through the code it looks like the created socket at

socket = new Socket(SocketType.Stream, ProtocolType.Tcp) { NoDelay = true };
isn't disposed if the connection failed.

This isn't an area I'm super familiar with, so a best guess from my side. Let me know if I can provide any more details.

Anything else we should know about your project / environment?

We create one channel for the lifetime of this application.

@alandoherty alandoherty added the bug Something isn't working label Oct 1, 2023
@JamesNK
Copy link
Member

JamesNK commented Oct 1, 2023

Thanks for reporting and looking into this.

I researched what can cause "Resource temporarily unavailable" and the pages I found mentioned sending data to a full buffer. That isn't the case here, but I could imagine not disposing broken sockets could cause this issue.

Fixed here.

@someview
Copy link

someview commented Nov 3, 2023

What version of gRPC and what language are you using?

Grpc.Net.Client 2.56.0

What operating system (Linux, Windows,...) and version?

Debian 11 Linux --redacted-- 5.10.0-10-amd64 #1 SMP Debian 5.10.84-1 (2021-12-08) x86_64 GNU/Linux

What runtime / compiler are you using (e.g. .NET Core SDK version dotnet --info)

.NET 7, published ready to run - so not sure how to obtain this information.

What did you do?

We recently had a significant period of downtime where our networking was being migrated for 2 hours, and found that the GrpcChannel wasn't automatically reconnecting after the downtime ended.

What did you expect to see?

Automatic reconnect.

What did you see instead?

We were seeing a strange error:

Grpc.Core.RpcException: Status(StatusCode="Unavailable", Detail="Error connecting to subchannel.", DebugException="System.Net.Sockets.SocketException: Resource temporarily unavailable")
---> System.Net.Sockets.SocketException (11): Resource temporarily unavailable
  at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken)
  at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token)
  at System.Net.Sockets.Socket.<ConnectAsync>g__WaitForConnectWithCancellation|281_0(AwaitableSocketAsyncEventArgs saea, ValueTask connectTask, CancellationToken cancellationToken)
  at Grpc.Net.Client.Balancer.Internal.SocketConnectivitySubchannelTransport.TryConnectAsync(ConnectContext context)
--- End of inner exception stack trace ---
  at Grpc.Net.Client.Balancer.Internal.ConnectionManager.PickAsync(PickContext context, Boolean waitForReady, CancellationToken cancellationToken)
  at Grpc.Net.Client.Balancer.Internal.BalancerHttpHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
  at Grpc.Net.Client.Internal.GrpcCall`2.RunCall(HttpRequestMessage request, Nullable`1 timeout)
  at Grpc.Net.Client.Internal.Retry.RetryCallBase`2.GetResponseAsync()
  -- REDACTED --

There aren't any issues with the network that I can see. I restarted the .NET application on one of the boxes running this code encountering this issue, and the issue was resolved.

This channel is being used very frequently, and throughout the downtime (and continuing after) would no doubt have created many calls to TryConnectAsync.

My best guess is that this is some issue with creating too many Socket instances that are cleaned up too slowly? Looking through the code it looks like the created socket at

socket = new Socket(SocketType.Stream, ProtocolType.Tcp) { NoDelay = true };

isn't disposed if the connection failed.

This isn't an area I'm super familiar with, so a best guess from my side. Let me know if I can provide any more details.

Anything else we should know about your project / environment?

We create one channel for the lifetime of this application.

We have meet this problem, but we can not ensure this issue of grpc.

@someview
Copy link

Thanks for reporting and looking into this.

I researched what can cause "Resource temporarily unavailable" and the pages I found mentioned sending data to a full buffer. That isn't the case here, but I could imagine not disposing broken sockets could cause this issue.

Fixed here.

is there other reason may cause this:
we have meet this even upgrade grpc version to 2.58.


Grpc.Core.RpcException: Status(StatusCode="Unavailable", Detail="Error connecting to subchannel.", DebugException="System.Net.Sockets.SocketException: Resource temporarily unavailable")                          │
│  ---> System.Net.Sockets.SocketException (11): Resource temporarily unavailable                                                                                                                                    │
│    at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken)                                                                               │
│    at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token)                                                                               │
│    at System.Net.Sockets.Socket.<ConnectAsync>g__WaitForConnectWithCancellation|281_0(AwaitableSocketAsyncEventArgs saea, ValueTask connectTask, CancellationToken cancellationToken)                              │
│    at Grpc.Net.Client.Balancer.Internal.SocketConnectivitySubchannelTransport.TryConnectAsync(ConnectContext context)                                                                                              │
│    --- End of inner exception stack trace ---                                                                                                                                                                      │
│    at Grpc.Net.Client.Balancer.Internal.ConnectionManager.PickAsync(PickContext context, Boolean waitForReady, CancellationToken cancellationToken)                                                                │
│    at Grpc.Net.Client.Balancer.Internal.BalancerHttpHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)                                                                             │
│    at Grpc.Net.Client.Internal.GrpcCall`2.RunCall(HttpRequestMessage request, Nullable`1 timeout)                                                                                                                  │
│    at ProtoBuf.Grpc.Internal.Reshape.UnaryTaskAsyncImpl[TRequest,TResponse](AsyncUnaryCall`1 call, MetadataContext metadata, CancellationToken cancellationToken) in /_/src/protobuf-net.Grpc/Internal/Reshape.cs: │
│    at TL.RoomService.Business.Implement.RoomQueueBusiness.Distinct_ExitUserRoom(QueueMsg_UserRoomExit queueMsg) in /home/jenkins/agent/workspace/TL-后端构建总控-4/script/backend/TL.RoomService/Source/Business/I │
│    at TL.Queue.QueueManageBase.OnReceiveAsync(QueueMsg queueMsg)                                        

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants