-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TCP [RST] intermittently ignored #25314
Comments
This issue (and this one: #14747) make me wonder if libuv is the best way to implement socket IO in Julia. Microsoft now has their own implementation of Perhaps it would be better for Julia's network IO layer to be built on BSD sockets + epoll/kevent and use something like WSL to provide compatibility with windows. It is frustrating to spend time figuring out what libuv is doing when debugging Julia IO stuff. The libuv documentation is thin and often says "See the linux man page for more". It often feels like it would be easier to work directly with the well defined BSD/Linux APIs that I know. Moving the main event loop from libuv to Julia might also help with other event related stuff: #22631 #13763. |
They've had it for years – it's used by libuv. WSL is entirely tangential to this; the relevant subsystem is whether the underlying driver being used is WSK. The old API (which did not support epoll) was deprecated in Windows 7, although I know of at least one corporate firewall that tries to prevent user programs from accessing the new subsystem (as of a couple years ago when I last checked). |
WSK implements |
Usually it just use IOCP, since presumably that's faster. But it looks like the original PoC repo has even been getting new updates recently https://github.com/piscisaureus/wepoll |
Can you provide any update information here? |
Most of the time when
TCPSocket
receives a[RST]
packet, libuv callsuv_readcb()
andUVError
,ECONNRESET
is thrown.I have a test case where hundreds of pipelined HTTP PUT Requests are sent to AWS S3. Typically the Requests get ahead of the Responses (e.g. when Request No. 70 is being sent, we may only be up to reading Response No. 10).
At some point the S3 server hits an internal limit on the number of Requests per connection (about 100) and stops sending Response data (e.g. we might send Request No. 120 and then while we're reading Response No. 30 data stops arriving. Sometimes the server sends
[RST]
right away and aUVError
,ECONNRESET
is thrown as expected. Note: the S3 doc suggests not to send more than 90 requests per connection. I'm sending more than that as a way to test corner case behaviour in HTTP.jl.However, monitoring with wireshark shows that sometimes the
[RST]
is not sent for a few minutes. It seems that in this case libuv does not notice the[RST]
, anduv_readcb
is not called. The result is that theeof()
call that the reader is waiting for blocks forever. I have a seperate task that periodically prints connection debug info. This shows that theLibuvStream.state
remainsStatusActive
.I have tried putting lots of printfs in libuv. What I see is that the
uv__stream_io
function is not called at all in the case where the[RST]
is missed. Maybe there is a race-condition inside libuv where the[RST]
is missed ifkevent
is not active when it arrives? Maybe for some reason libuv forgets to submit the socket tokevent
, or does not indicate interest in the correct event type? (I'm not familiar with kqueue).I have tried modifying
wait_readnb
so that it wakes up and doesuv_read_start
again every so often while waiting. This makes no difference.As a practical solution for HTTP.jl I've implemented a Retry Layer that uses a seperate task to close stuck connections. Calling
close
results in the blockedeof()
task waking up, discovering the connection is gone, and retring the Request.The text was updated successfully, but these errors were encountered: