Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove cluster state version downgrade fallback #32297

Merged
merged 1 commit into from
Aug 29, 2024

Conversation

vekterli
Copy link
Member

@geirst please review

To avoid inherent race conditions with overlapping cluster controller leader intervals (caused by the old leader not yet knowing it has been deposed) where both an old state version and a newer state version is concurrently published, we want to only accept strictly increasing version numbers (for the lifetime of a process; these are currently not durably stored on content nodes). On the cluster controllers themselves, this version number is backed by a ZooKeeper quorum, ensuring that it is durably stored where it matters the most.

A content node only observing strictly increasing version numbers is an invariant that holds unless an explicit fallback is triggered, where we can still accept an older version.

This fallback was intended to be a "failsafe" if ZooKeeper state on the cluster controllers was lost, but its implementation depended on information that is not actually present in all CC RPCs, meaning that it could kick in even when not intended, thus rendering the race condition protection void.

The CC RPC in question is not easily extensible, so instead remove the fallback entirely. This has the bonus of content nodes actually being able to rely on the version invariant internally. Downside is that content node and distributor processes must be restarted to accept a lower state version upon ZK state loss, but in that case you probably have bigger problems.

To avoid inherent race conditions with overlapping cluster
controller leader intervals (caused by the old leader not yet knowing
it has been deposed) where both an old state version and a
newer state version is concurrently published, we want to only
accept strictly increasing version numbers (for the lifetime of
a process; these are currently not durably stored on content nodes).
These version numbers are backed by a ZooKeeper quorum, ensuring that
they _are_ durably stored for cluster controllers.

A content node only observing strictly increasing version numbers is
an invariant that holds _unless_ an explicit fallback is triggered,
where we can still accept an older version.

This fallback was intended to be a "failsafe" if ZooKeeper state on
the cluster controllers was lost, but its implementation depended
on information that is not actually present in all CC RPCs, meaning
that it could kick in even when not intended, thus rendering the
race condition protection void.

The CC RPC in question is not easily extensible, so instead remove
the fallback entirely. This has the bonus of content nodes actually
being able to rely on the version invariant internally. Downside is
that content node and distributor processes must be restarted to
accept a lower state version upon ZK state loss, but in that case
you probably have bigger problems.
@vekterli vekterli requested a review from geirst August 29, 2024 09:31
Copy link
Member

@geirst geirst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@vekterli vekterli merged commit d235a77 into master Aug 29, 2024
3 checks passed
@vekterli vekterli deleted the vekterli/remove-state-version-downgrade-fallback branch August 29, 2024 13:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants