Requests getting lost when handle_request receives non-selectively #42

ferd · 2011-05-31T18:34:31Z

TL;DR: When we're using 'handle_request/5' to deal with a request, the control of the socketio_http process is given up to the user and queries can get lost when doing a non-selective receive during quick successions of client requests

I used the handle_request/5 call and had it communicating with non-socket.io processes. As part of its task, the callback had to do a non-selective receive and digest everything it received in a while to handle them later (much like a gen_server would do it).

However, socketio_http runs the 'handle_request' function calls within its own process, so whenever socketio_http_misultin or some other instance calls the socketio_http server for the current connection more than once (can be simulated by reloading many tabs of the same browser really fast), the 'handle_request' currently running is the one getting the {request, 'GET', ...} message bypassing the socketio_http process altogether. The process model used in this case should be modified to avoid such issues.

In the meantime (and for external readers), I solved the issue by basically spawning a new process in handle_request, then monitoring it and waiting only for the monitor of that process to be done. During that time the other process can send the data to the socket as much as it wants, but we're forcing keep-alive/pipelined calls to be serialized when they're in the same section. This exact model should probably be moved from my own application to socket.io-erlang itself (to be discussed).

ferd · 2011-06-01T12:47:18Z

One question I meant to ask is that if we go with my solution for the whole thing, do we accept to duplicate the 'Req' value with all of its contents? I know misultin already does sandboxing that way -- we'd end up with 3 copies of the socket by acting that way, but I see no other good fix that doesn't involve doing things that way. Is it worth it for the corner cases where people do it my way and punish everyone, or punish the few people who have that issue by needing them to look at documentation (which we will need to write at some point).

sinnus · 2011-09-29T15:09:03Z

@ferd, could you copy paste solution?
Thank you!

ferd · 2011-09-29T15:21:30Z

Here you go, @sinnus. The code is a bit customized for some closed-source app I had, and you can disregard some options such as SessionMods and SessionMethod. They were application-specific things.

%% How to start the connection and whatnot
start_link(Options) ->
    SessionMods = proplists:get_value(sessions, Options, []),
    SessionMethod = proplists:get_value(session_method, Options, [{http_cookie, "sessionid"}]),
    Port = proplists:get_value(port, Options),
    {ok, Pid} = socketio_listener:start([{http_port, Port},
            {default_http_handler, {?MODULE, [SessionMods, SessionMethod]}}]),
    EventMgr = socketio_listener:event_manager(Pid),
    ok = gen_event:add_handler(EventMgr, ?MODULE, [SessionMods, SessionMethod]).

%% Default HTTP Handler
 handle_request(_Method, _Path, Req, SessionMods, SessionMethod) ->
    %% Call some function to handle this HTTP call!
    %% This depends on messy knowledge abouy misultin sadly.
    %% This spawning business is necessary due to the following
    %% socket.io issue: https://github.com/yrashk/socket.io-erlang/issues/42
    Pid = spawn(fun() -> (my_module:loop(SessionMods, SessionMethod))(Req) end),
    Ref = erlang:monitor(process,Pid),
    receive
        {'DOWN', Ref, process, _, _} -> ok
    end.

You can see the trick is starting at Pid = spawn(....), and finishes with the receive block. Let me know if you need more explanations.

sinnus · 2011-09-29T16:10:28Z

@ferd, I can't understand how messages can get lost?

ferd · 2011-09-29T16:18:58Z

New socket calls are received as messages by the TCP socket controlling process. Misultin dispatches a process to do that, which socket.io's code sits in to redirect the messages to the right spot.

So if the handle_request/N function is a long running one (could take a few seconds), sits in the same process as socket.io (because it's a callback) and thus in the misultin process, and uses selective receives, then when more data is coming form the socket (say, pipelined calls) or whatever, the long-running function will be consuming these messages instead of leaving them to the socket.io process that usually deals with them.

These messages were likely meant for the socketio process in charge of handling queries, but might have been consumed or discarded by your handler. That means that on the other end of the request (the browser), the query will never be replied to and it will time out.

This can also be true of messages sent to the socketio process in charge of handling queries, unrelated to sockets (internal communications). Because of this, I had to split my queries into two processes. One one hand, the handle_request function will be guaranteed not to consume messages it wasn't meant to, and by spawning a new process to handle the actual request, I allow myself to do and receive whatever I want in my_module:loop/N.

sinnus · 2011-09-29T17:02:04Z

@ferd, is it Misultin issue to bypass messages if a function takes a lot of time?

ferd · 2011-09-29T17:05:34Z

I don't think we could pinpoint the problem to a single place. The problem comes from the interaction of many logical actors (Misultin's side, socket.io-erlang's side, the callback's side) running with the same mailbox. You can fix the issue on any side. I don't know of a way to do it that will not require adding processes, though.

sinnus · 2011-09-29T17:10:23Z

@ferd, thank you for your answers! As I can understand the problem is that mailbox is "shared" between other sides,

ferd · 2011-09-29T17:11:26Z

Exactly. Then ownership of messages for each query or worker becomes extremely vague. Misultin doesn't read messages, but you can have conflicts between your callbacks and socket.io-erlang. Using processes the way I did it solves the problem.

sinnus · 2011-09-30T05:26:44Z

Hello, @ferd! I tested "long running" callback for longpolling and websocket using timer:sleep:

-module(socketio_transport_websocket).

    ...
%% Websockets
handle_call({websocket, Data, _Ws}, _From, #state{ heartbeat_interval = Interval, event_manager = EventManager } = State) ->
    error_logger:info_msg("Before timer~n", []),
    timer:sleep(1000),
    error_logger:info_msg("After timer~n", []),
    ...

-module(socketio_transport_polling).

    %% Incoming data
handle_call({_TransportType, data, Req}, From, #state{ server_module = ServerModule,
                                                       event_manager = EventManager,
                                                       polling_duration = Interval,
                                                       sup = Sup } = State) ->
    error_logger:info_msg("Before timer~n", []),
    timer:sleep(1000),
    error_logger:info_msg("After timer~n", []),
    ...

I used flash socket.io client to send data to server and all messages was processed by server side. I only faced with longpolling timeout when sleep:timer > longpolling timeout.
Can you help me to write test to reproduce issue?

ferd · 2011-09-30T11:33:07Z

I don't sure I understand? You're trying to reproduce the issue I had?

sinnus · 2011-09-30T12:07:35Z

Yes, I'm trying to reproduce the issue without "reloading many tabs of the same browser really fast".

sinnus · 2011-09-30T19:22:43Z

@ferd, I have reproduced your issue. My test code above was in wrong place because of misunderstanding, sorry.

My steps to reproduce:

Open index.html
Send data from flash client
No echo on client side
After 10 seconds (timeout) echo will be received

handle_request('GET', [], Req) ->
    error_logger:info_msg("Before timer", []),
    timer:sleep(10000),
    error_logger:info_msg("after timer", []),
    Req:file(filename:join([filename:dirname(code:which(?MODULE)), "index.html"]));

sinnus · 2011-09-30T19:27:38Z

As we can see handle_request('GET', [], Req) blocks socket.io server. All clients will be wait handle_request function.

sinnus added a commit to sinnus/socket.io-erlang that referenced this issue Oct 2, 2011

Fix issue yrashk#42 (socketio_http locks server)

c9fb1ec

sinnus added a commit to sinnus/socket.io-erlang that referenced this issue Oct 3, 2011

Fix issue yrashk#42 (restored Resource after prev refactoring)

d88c4cf

sinnus added a commit to sinnus/socket.io-erlang that referenced this issue Oct 3, 2011

Fix issue yrashk#42 (fixed error_logger from prev refactoring)

5eee4fd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Requests getting lost when handle_request receives non-selectively #42

Requests getting lost when handle_request receives non-selectively #42

ferd commented May 31, 2011

ferd commented Jun 1, 2011

sinnus commented Sep 29, 2011

ferd commented Sep 29, 2011

sinnus commented Sep 29, 2011

ferd commented Sep 29, 2011

sinnus commented Sep 29, 2011

ferd commented Sep 29, 2011

sinnus commented Sep 29, 2011

ferd commented Sep 29, 2011

sinnus commented Sep 30, 2011

ferd commented Sep 30, 2011

sinnus commented Sep 30, 2011

sinnus commented Sep 30, 2011

sinnus commented Sep 30, 2011

Requests getting lost when handle_request receives non-selectively #42

Requests getting lost when handle_request receives non-selectively #42

Comments

ferd commented May 31, 2011

ferd commented Jun 1, 2011

sinnus commented Sep 29, 2011

ferd commented Sep 29, 2011

sinnus commented Sep 29, 2011

ferd commented Sep 29, 2011

sinnus commented Sep 29, 2011

ferd commented Sep 29, 2011

sinnus commented Sep 29, 2011

ferd commented Sep 29, 2011

sinnus commented Sep 30, 2011

-module(socketio_transport_websocket).

-module(socketio_transport_polling).

ferd commented Sep 30, 2011

sinnus commented Sep 30, 2011

sinnus commented Sep 30, 2011

sinnus commented Sep 30, 2011