Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add missing cluster singleton detection #7363

Merged

Conversation

Arkatufus
Copy link
Contributor

@Arkatufus Arkatufus commented Oct 23, 2024

As per user request, add missing cluster singleton detection to ClusterSingletonProxy to help debug misconfigured singleton setup

Detection timer:

  • starts when the proxy actor started
  • starts when the singleton actor ref is lost
  • continuously emits/logs timeout event/log until singleton is found

Changes

  • Add new HOCON setting to opt-out of this feature
  • Add new HOCON setting to set the detection period (defaults to 5 minutes)
  • Add detection code to ClusterSingletonProxy
  • Add unit test

@Aaronontheweb Aaronontheweb added this to the 1.5.31 milestone Nov 11, 2024
Copy link
Member

@Aaronontheweb Aaronontheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments

Success = 0,
Timeout = 1,
}
public sealed class IdentifySingletonResult : Akka.Actor.INoSerializationVerificationNeeded
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably make this and the enum private so it doesn't appear in the public API surface

@@ -61,6 +61,12 @@ akka.cluster.singleton-proxy {

# Interval at which the proxy will try to resolve the singleton instance.
singleton-identification-interval = 1s

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Config changes look good

@@ -37,7 +37,7 @@ namespace Akka.Cluster.Tools.Singleton
/// Note that this is a best effort implementation: messages can always be lost due to the distributed nature of the actors involved.
/// </remarks>
/// </summary>
public sealed class ClusterSingletonProxy : ReceiveActor
public sealed class ClusterSingletonProxy : ReceiveActor, IWithTimers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

// ignoring the timeout tick message.
if (_singleton is not null)
{
Timers.Cancel(IdentifySingletonTimeOutTick.Instance);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might consider adding a debug message here in case there's an actual bug that lead to this, but that's not a big deal. Good to check for this condition and just turn the timer off and return early though - LGTM

"ClusterSingletonProxy failed to find an associated singleton named [{0}] in role [{1}] after {2} seconds.",
_settings.SingletonName, _settings.Role, _settings.SingletonIdentificationFailurePeriod.TotalSeconds);

Context.System.EventStream.Publish(IdentifySingletonResult.Timeout(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, is this why IdentifySingletonResult is part of the public API? So apps can subscribe to it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it is a good addition, people can use this to detect if something went wrong in their application by hooking to this event, instead of having to check their logs

@Aaronontheweb Aaronontheweb merged commit fb526e5 into akkadotnet:dev Nov 11, 2024
10 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants