-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add optional feature to set the SO_REUSEADDR option before binding th… #638
base: master
Are you sure you want to change the base?
Conversation
…e socket Signed-off-by: Fábio Urquiza <[email protected]>
Hey, thanks for the patch. I have couple question.
|
Here's what GPT4 had to say about this: https://chat.openai.com/share/b515e3dc-6193-4550-8d5d-3ecaa8088755 |
Yeah, I'm aware of the risks and the follow article explain it really well: https://hea-www.harvard.edu/~fine/Tech/addrinuse.html I think the main risk is to have a socket A receiving packets that should be received by socket B, but I think it doesn't matter much for the load test itself (it would matter for the report in the end, but I think we could add a note about that). |
And this post on stackoverflow talks about the different implementation of the REUSEADDR and REUSEPORT flags on many architectures. In resume the REUSEADDR flag stays the same, while we have some minor variations on the REUSEPORT flag. https://stackoverflow.com/questions/14388706/how-do-so-reuseaddr-and-so-reuseport-differ |
About the suggestions made by GPT4, something we cannot or are not allowed to modify sys net ipv4 parameters. For example, decreasing the tcp_fin_timeout will also affect other services, which may not be desirable. |
I need to convince myself that this is safe, and so far I haven't been able to. Maybe you can help me get there. If someone runs into "bind: address already in use" even given the fact that we're using HTTP keep alive underneath and re-using TCP connections, it like means the server is very slow to respond, or the network really faulty, or a mix. In either case, the TCP connection would still be open, and the client reading a response, so re-using the local address seems unsafe? Or is it only applicable when the underlying TCP connection is in TIME_WAIT state? In any case, I'd like to write some sort of integration test that exercises this and gives us some more confidence. |
Exactly. On our use case we were executing a load test to a load balancer composed of 8 front end servers. Those servers queries a route table located on another server and do a proxy_pass to one internal server out of 20. We were trying to find out what happens when the communication to the route table service becomes unresponsible. Because how the retries were implemented a request that took milliseconds to be responded went to up 5 seconds. We wanted to know if the front end servers would crash or also become unresponsible but we started to runs into "bind: address already in use". The setting of the reuse option fixed this on our case. It only allows the reuse of TIME_WAIT sockets when you are binding to different combination of remote IP address and port. So if you have a socket in TIME_WAIT with the combination LOCAL_ADDR_A:PORT_A <=> REMOTE_ADDR_A:PORT_X and try the same combination again it will not allow, but if use a combination with another remote IP address or port like LOCAL_ADDR_A:PORT_A <=> REMOTE_ADDR_B:PORT_X it will reuse the LOCAL_ADDR_A:PORT_A when binding. |
If you're hitting "bind: address in use" issues in your load tests, it typically means you're exhausting the capabilities of the network stack on the host. SO_REUSEADDR and SO_REUSEPORT won't meaningfully impact those constraints. And, in any case, it's a situation that you would — should! — never encounter in a normal deployment, so it doesn't make sense to try to accommodate in a edit: Instead, you want to make sure that |
Background
Add an optional parameter to set the SO_REUSEADDR option on the TCP socket before binding
it (default false). Can reduce the amount of "bind: address already in use" errors when doing many
connections with the same address and port combination, like the ones that happened on #583.
Checklist