-
Notifications
You must be signed in to change notification settings - Fork 654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
grpc-js servers not sending keepalives #2734
Comments
Server-side keepalives were added recently in grpc-js, but they should be working in version 1.10.6. However, they're not logged on the server the way they are on the client, so you can't verify that they're being sent in the same way. What are your observations that suggest that the server is not sending pings? |
Originally it was because GCP Cloud NAT has a 10 minute limit for idle connections with no TCP traffic. It just drops your TCP connection for any VM running on GCE if there's no packets. I had a long-lived gRPC bidi stream that fit this profile and grpc-js was not reporting a disconnect on either the client or server. Just hangs the connection forever and doesn't realize it's dead even when I try to later use the channel. On the server: I set grpc.keepalive_time_ms to 6 minutes. And I set grpc.keepalive_timeout_ms to 20 seconds. But it didn't have any effect: The connections still just died without any indication. Then I added logging to server.ts in _sessionHandler but I didn't observe any logged statements. (Maybe I did it wrong). And it didn't help the problem. Moving on to the client: I then added the same keepalive channel options to my gRPC clients and added logging in transport.ts. Both the logging works and the keepalives are keeping my GCE TCP connections alive even when idle now. As an aside, I do wrap grpc with nice-grpc package often. So I haven't ruled out that dependency as a cause yet. EDIT: in all cases, there was a long-lived bidi stream. This wasn't a situation where I had an open channel without an ongoing RPC. |
Haven't forgotten about this. I made a PoC with unary and streaming RPCs using base grpc-js from head just now, and keepalives originating from both the client and server do seem to transmit. I'm trying next to see if somehow my use when going through I've identified a few areas for improvement, hoping you'll consider a PR for them:
|
If you like my changes, I've drafted up some fixes to what I think is the most obvious (to my eyes) potential cause in #2756 |
I'm kind of surprised that no one else has noticed this bug. Especially since GCP's firewall auto-drops connections after 10 minutes of inactivity. Hypothesis:
OTOH, the code inspection plus nodejs/node#18447 seem like strong evidence that the behaviors I was observing are in fact aligned. If you have any thoughts, LMK. I'm happy to submit a second PR for improving keepalives on the client side as well with logging. Getting this right is pretty important for my use case. |
Here's my guess for why this bug wasn't noticed:
|
FWIW, in my original bug I obsered that neither the client nor server were detecting the disconnects. This is because when GCP decided to drop a connection it was not sending any type of TCP RST to either party. So both of them simply thought the TCP connection was still alive, but the GCP router in the middle would just silently drop packets past the 10 minute idle mark with zero notification. I took a stab at #2760 that I hope is easier for you to review and merge. |
Right, but the client keepalives should detect the disconnect independent of what the server does. And if only the server detects the disconnect, it can't do anything about it, because it can't reestablish the connection or inform the client of the error. |
I think this is resolved by your changes in #2760, which are now out in version 1.10.10. |
I've noticed that grpc-js servers do not seem to be sending keepalives to gRPC clients when they are connected to a long-lived stream. (But only within my larger applications so far)
I'm very happy to create a tiny PoC and then upload it here, but before I build it up, I wanted to clarify if grpc-js ChannelOptions were in fact intended for gRPC servers to send PINGs to clients. Or if it was only meant to be clients and I misunderstood the docs. I can confirm that clients are happily sendings PINGs as expected in
1.10.6
For example, I've instrumented the grpc-js server.ts file to log when pings may happen, but I don't see it. Server has long-lived bidi streams from a client and the server is configured with:
... and no pings are sent and no keepalive disconnects will occur, in my app so far.
From https://grpc.io/docs/guides/keepalive/ it looks like it has under the column
Availability
bothClient and Server
for keepalives. And https://github.com/grpc/grpc/blob/master/doc/keepalive.md says that if "if the ping is not acknowledged by the peer".I haven't ruled out a bug in my own setup yet, but I would like to verify that the intention is that both either/or the client/server can independently set the keep alives and transmit their own PINGs. I'll be happy to make a PoC and continue this bug with my findings if yes.
Thank you for your time!
The text was updated successfully, but these errors were encountered: