Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Augment gen_server timeout handling #9287

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Maria-12648430
Copy link
Contributor

While working on something entirely different, it ocurred to me that timeouts in gen_server have some drawbacks and pitfalls.

When the return of gen_server:init or a handle_* callback contains a timeout, the next iteration will wait for a message for that time, and if no message arrives, instantly call handle_info with the message timeout (ie, not sending itself a timeout message).

For one, this convolution of handle_info for message and timeout handling makes it impossible to distinguish an actual timeout from the message timeout having been received. This means that you either have to make sure that, if you are using timeouts, such a message will not be sent to your gen_server implementation, or live with possible premature "fake" timeouts.

For another, it is very inconvenient if one wants to enricht the timeout with some supplementary data. This is only possible by putting it somewhere in the state, and must likely be removed from the state again if a message arrives before the timeout occurs.

Another option is to manually maintain timers, which is also inconvenient and requires quite some boilerplate code.

The changes in this PR allow init and handle_* callbacks to return the tuple {timeout, Time, TimeoutMessage}. In case a timeout occurs, handle_info will instantly be called with the message {timeout, TimeoutMessage}.

There are no tests or documentation yet, I first want to see if there is any interest in this feature at all.

Copy link
Contributor

github-actions bot commented Jan 13, 2025

CT Test Results

    2 files     96 suites   1h 8m 11s ⏱️
2 174 tests 2 127 ✅ 47 💤 0 ❌
2 537 runs  2 488 ✅ 49 💤 0 ❌

Results for commit 6c2ef49.

♻️ This comment has been updated with latest results.

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

// Erlang/OTP Github Action Bot

@RaimoNiskanen
Copy link
Contributor

Hi!

You point out a bunch of design quirks/flaws with the gen_server time-out. There are at least two more:

  • Receiving a system message restarts the time-out, so it may become longer than requested.
  • A time-out overrides the hibernate_after time-out, so the gen_server won't hibernate until after the time-out has elapsed.

But this PR only addresses one problem, namely attaching supplementary data to the time-out.

The PR's suggested form for the new time-out is {timeout, Time, TimeoutMessage} that looks exactly like gen_statem's {timeout, Time, EventContent}, which makes it familiar. That can be both good and bad... Good because it is familiar, and bad because it still is subtly different from gen_statem's time-out.

So that means that if this PR is accepted we have another flawed time-out in gen_server, and we cannot use that time-out form for a gen_statem time-out equivalent. Not that we would want to (i think ever) do that.

I lean towards not augmenting the gen_server time-out. It is what it is. Augmenting it with gen_statem time-outs would not be a small rewrite. If you want something better, use gen_statem or roll your own with erlang timers according to your current needs.

@@ -2264,6 +2283,15 @@ loop(Parent, Name, State, CbCache, infinity, HibernateAfterTimeout, Debug) ->
loop(Parent, Name, State, CbCache, hibernate, HibernateAfterTimeout, Debug)
end;

loop(Parent, Name, State, CbCache, {timeout, Time, TimeoutMsg} = Timeout, HibernateAfterTimeout, Debug) ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps Time =:= infinity should be handled here instead of in handle_msg/7, handle_common_reply/8 below, and two places above; init_it/5 and enter_loop/5 (missing).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, I had it this way before, then thought having it in the handle_msg and friends functions would be better. But you're right, especially when combined with what @michalmuskala suggested. I'll change it back later.

@RaimoNiskanen
Copy link
Contributor

Then again, this particular enhancement doesn't change any of the good qualities about the gen_server time-out. It is still simple and light-weight.

So the question is how much better would the gen_server time-out be with the addition of supplementary data...? It might be worth it.

@michalmuskala
Copy link
Contributor

I mostly agree with what Raimo said, the only thing I would consider changing, if this is going forward would be to have the message delivered be just Msg for {timeout, Time, Msg} instead of {timeout, Msg}. If the user wants to wrap it, they can and it gives more flexibility if that's not necessary. It also gives a direct translation for the current option as {timeout, Time, timeout}

@Maria-12648430
Copy link
Contributor Author

@RaimoNiskanen

There are at least two more

I know, but I don't think that much can be done about those =( In any case, I wasn't trying to with this PR ;)

Then again, this particular enhancement doesn't change any of the good qualities about the gen_server time-out. It is still simple and light-weight.

So the question is how much better would the gen_server time-out be with the addition of supplementary data...? It might be worth it.

To be sure, I took care not to change anything that would affect the current way how timeouts work in gen_server. Ie, anything that works now will keep working just the same with these changes applied. It also doesn't stand in the way of addressing the other 2+ quirks you mentioned, if we ever want to do that.


@michalmuskala

I mostly agree with what Raimo said, the only thing I would consider changing, if this is going forward would be to have the message delivered be just Msg for {timeout, Time, Msg} instead of {timeout, Msg}. If the user wants to wrap it, they can and it gives more flexibility if that's not necessary. It also gives a direct translation for the current option as {timeout, Time, timeout}

Excellent point, this is much better. Thanks for the suggestion 🤗


I think I'll just let this discussion run for a while longer before making any changes to it, though 🙂

@u3s u3s added the team:PS Assigned to OTP team PS label Jan 14, 2025
@RaimoNiskanen
Copy link
Contributor

@Maria-12648430 wrote:

It also doesn't stand in the way of addressing the other 2+ quirks you mentioned, if we ever want to do that.

But in a way it does. If we would like to introduce the gen_statem time-outs in gen_server, that addresses most or all of the mentioned time-out issues, then this suggested format is one of the gen_statem formats. So if we implement this PR, that gen_statem format has been occupied by a subtly different look-alike.

For instance; we could instead of this minimal suggestion, re-use that Timeout loop variable in gen_server:loop to be a (wrapped?) reference() to a running erlang timer. I guess such an implementation could fix all gen_server time-out quirks except being possible to fake (but just like this suggestion decrease the impact of that quirk since the time-out message could be chosen arbitrarily).

@Maria-12648430
Copy link
Contributor Author

@RaimoNiskanen ok, maybe I got you a bit wrong in the beginning.

So I understand that you are not against adding the functionality this PR provides, but that you would like it to also address the other (at least two) quirks you mentioned, ie that adding only the ability to transmit timeout data falls short and may even get in the way of adressing the other issues later?

Tackling the other issues would mean changing gen_servers timeout handling to be more like how gen_statem handles event timeouts. Going this way is IMO not a bad thing, aligning them more closely, but also more effort. OTOH, looking at gen_statem, it is actually a pretty straightforward thing once you can see through all the layers of "gen_statem fancyness" around it. I think it can be done, and we would be willing to give it a try.

That is, unless you say that it's unlikely to be accepted, then of course we will save ourselves the trouble 😜

@juhlig
Copy link
Contributor

juhlig commented Jan 15, 2025

we would be willing to give it a try

What do you mean, "we"? Count me in 👍

@RaimoNiskanen
Copy link
Contributor

@Maria-12648430 Precisely. I would like that when adding a time-out that looks like in gen_statem, it could be annoying if it doesn't behave like the same gen_statem time-out, which is the event time-out.

I would also like to not slow down gen_server, its engine keeps all its internal state in gen_server:loop/7 variables, so if it turns out to be possible to implement an erlang timer based event time-out through re-use of the Time argument, then the engine should run as before. Except that slightly more garbage would be produced if the callback module uses the new time-out.

I also would like that all quirks are addressed, not just the two I mentioned. But I guess it is impossible to completely fix fakeability without adding a new gen_server callback, so enabling a free choice of time-out message I'd say would be good enough.

So far I have not gotten any opinions from the rest of the team, I will have to get back to you about if it would be likely to be accepted...

@Maria-12648430
Copy link
Contributor Author

I also would like that all quirks are addressed, not just the two I mentioned.

Ok, well then... what are the others? 😅

@Maria-12648430
Copy link
Contributor Author

But I guess it is impossible to completely fix fakeability without adding a new gen_server callback, so enabling a free choice of time-out message I'd say would be good enough.

Indeed. In gen_statem, timeout events can be clearly distinguished from other events, external or internal. In gen_server, we would have to introduce a dedicated (albeit optional) callback like handle_timeout/2, which even with a fallback to handle_info would break any existing gen_server implementation which happens to have such a function (exported) already, which is not all that unlikely. The only way to mitigate this (that I can see) is to have another (optional) callback like what callback_mode does for gen_statem, which among other things could return how timeouts are to be handled (like, {timeout, handle_info} (current behavior, default) vs {timeout, handle_timeout}). Which AFAIK is a step on the road to over-engineering things, you are only a few steps away from just using gen_statem then (but on a side note, the "entirely different" thing I was working on as mentioned in the OP was the supervisor module, which is in turn a gen_server, which can't be easily exchanged for gen_statem).

So far I have not gotten any opinions from the rest of the team, I will have to get back to you about if it would be likely to be accepted...

Ok, keep us in the loop ➰ 🙂 🤗

@RaimoNiskanen
Copy link
Contributor

I also would like that all quirks are addressed, not just the two I mentioned.

Ok, well then... what are the others? 😅

Hmmm, I had the feeling you mentioned a few, but it seems you mentioned supplementary data, fakeability, and that the timeout message is just passed in a call instead of sent. My two points are just elaborations of your third. So the known quirks should be:

  • Fakeability (unfortunately it is not a word)
  • No supplementary data
  • Receiving a system message restarts the time-out
  • A time-out overrides the hibernate_after time-out

So, as you point out, by enabling arbitrary time-out messages according to your suggestion, the two quirks I brought up would be the only remaining, and they are a consequence of the current time-outs not being delayed process messages.

@Maria-12648430
Copy link
Contributor Author

@RaimoNiskanen say, is there anything speaking against carrying some of the loop/7 variables in a record instead of distict variables? Some of them, like Parent, Name, HibernateAfterTimeout never change, but they have to be passed around to supplementary functions explicitly all the time, because in the end they end up calling loop/7 again.

I can go with that if that is better for some reason, but it makes the going a little tough, and a record would at least feel a lot better XD

@RaimoNiskanen
Copy link
Contributor

It is in general beneficial to reduce the number of loop arguments, because of the BEAM calling convention. For example in handle_msg/6 where reply/2 is called (which calls gen:reply/2 so it cannot be inlined), then 5..6 arguments have to be pushed onto the stack before the call and restored after since they are needed in the tail recursive call to loop/7 that follows. Reducing unnecessary register moves is desirable.

So keeping Parent, Name, HibernateAfterTimeout and probably CbCache in a record would probably be good, for performance and readability. CbCache only changes after wakeup from hibernate so rebuilding the record then is probably not a problem.

HibernateAfterTimeout has to be looked up from the record almost every time the loop enters a receive statement. A record lookup is slower than having it directly in argument, but that would probably not be possible to notice.

This is the kind of things I looked at when optimizing gen_statem to try to keep up with gen_server. I haven't looked at optimizing gen_server, but there seem to be obvious things to do, like this.

For example I see that the argument order differs between loop/7 and its helper functions such as handle_common_reply/8,9 which should force a lot of register moves. When the argument order is the same the compiler often figures out that it can just keep the argument registers where they are for the call, just having to move about a few. But the code there introduces a new argument as the first instead of as the last.

Maybe helpers like try_handle_continue/3 should be marked for inlining so calling them could avoid pushing all loop arguments.

One would have to look at the generated BEAM assembly to see these undesired register move clusters along the hot path that may be possible to avoid. Finding a balance between keeping volatile state in argument registers versus avoiding lots of register shuffling is the game.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team:PS Assigned to OTP team PS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants