Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: duplicated schemas on rapid elections while continuous produce of records #938

Conversation

eliax1996
Copy link
Contributor

@eliax1996 eliax1996 commented Aug 21, 2024

coordinator rewrite

  1. moving the coordinator in a separate thread
  2. adding a waiting time between when the master its elected and the master can act. This has been done to avoid rapid elections of master that may produce schemas with different ids.

Example of what could happpen without the delay:

|--------------------------------------|
|Node | Node1    | Node2    | Node3    |
|Role | Master   | Follower | Follower |
|--------------------------------------|


Node1 -> Send Message A{id=max(current_ids)} to kafka

where the max(current_ids) = 10

---------------------------------------

Node1 its disconnected, the message its still in the producer queue of Node1

---------------------------------------

Node2 its elected master

|--------------------------------------|
|Node | Node1    | Node2    | Node3    |
|Role | Follower | Master   | Follower |
|--------------------------------------|

----------------------------------------


Node2 produces a message B{id=max(current_ids)} to kafka

Because the message A isn't yet delivered to Node2, the max(current_ids) returns still 10.
And we have an ID clash.

The solution its simple, each master should wait a reasonable high number of milliseconds before acting as a master.
So that all the in-flight messages are delivered to kafka + the reasonable delay of the consumer for the master node before noticing that a message has been produced

@eliax1996 eliax1996 force-pushed the eliax1996/make-sure-master-wait-a-while-before-being-active-and-if-new-messages-are-arriving branch from 7b04f19 to 285c950 Compare August 22, 2024 16:25
@eliax1996 eliax1996 changed the title WIP: fix duplicated schemas on rapid elections while continuous produce of records fix: duplicated schemas on rapid elections while continuous produce of records Aug 22, 2024
@eliax1996 eliax1996 force-pushed the eliax1996/make-sure-master-wait-a-while-before-being-active-and-if-new-messages-are-arriving branch 2 times, most recently from 52fb34c to 054efb7 Compare August 22, 2024 16:50
@eliax1996
Copy link
Contributor Author

immagine

@eliax1996 eliax1996 force-pushed the eliax1996/make-sure-master-wait-a-while-before-being-active-and-if-new-messages-are-arriving branch 5 times, most recently from 1fa1052 to 7f657bd Compare August 23, 2024 10:14
Copy link

github-actions bot commented Aug 23, 2024

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  src/karapace
  config.py
  schema_reader.py 377-378
  schema_registry.py 100
  schema_registry_apis.py 106-107, 687-710, 1352
  typing.py 111, 115
  src/karapace/coordinator
  master_coordinator.py 104, 106-107, 163, 179
  schema_coordinator.py 198, 213-214, 499
Project Total  

This report was generated by python-coverage-comment-action

@eliax1996 eliax1996 force-pushed the eliax1996/make-sure-master-wait-a-while-before-being-active-and-if-new-messages-are-arriving branch from 49a9fd2 to c4e58e8 Compare August 23, 2024 14:18
@jclarysse
Copy link
Contributor

Please also add the new config waiting_time_before_acting_as_master_ms to https://github.com/Aiven-Open/karapace/blob/main/README.rst#configuration-keys

@eliax1996 eliax1996 force-pushed the eliax1996/make-sure-master-wait-a-while-before-being-active-and-if-new-messages-are-arriving branch 6 times, most recently from 8183430 to 784a2fa Compare August 29, 2024 13:34
karapace/schema_reader.py Outdated Show resolved Hide resolved
@eliax1996 eliax1996 force-pushed the eliax1996/make-sure-master-wait-a-while-before-being-active-and-if-new-messages-are-arriving branch from 784a2fa to 3f84dff Compare September 18, 2024 14:15
@eliax1996 eliax1996 force-pushed the eliax1996/make-sure-master-wait-a-while-before-being-active-and-if-new-messages-are-arriving branch 7 times, most recently from 9f3d8bc to 25e6c16 Compare November 13, 2024 15:56
@@ -1307,7 +1335,7 @@ async def _forward_request_remote(
if auth_header is not None:
headers["Authorization"] = auth_header

with async_timeout.timeout(timeout):
async with async_timeout.timeout(timeout):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was wrong since a while

Comment on lines 553 to 532
LOG.info("Resetting generation status")
# this is called immediately after the election, we shouldn't reset this
# until a new node its elected aka the other path where a new node its elected
# otherwise this its called at each round and we keep not counting the 5 seconds
# required before the election.
# self._are_we_master = False
self.generation = OffsetCommitRequest.DEFAULT_GENERATION_ID
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was called because we were exiting the election loop in the _async_loop of the master_coordinator.py. We must keep running the thread/algorithm otherwise we are always electing a new node since that causes a rebalance and the rebalance causes a new election (the rebalance happen because we sent the reset_generation as a side effect of closing the heartbeat task)

@eliax1996 eliax1996 force-pushed the eliax1996/make-sure-master-wait-a-while-before-being-active-and-if-new-messages-are-arriving branch 2 times, most recently from a422f24 to d7cdd8a Compare November 14, 2024 07:39
Comment on lines 93 to 98
# why do we need to close?
# we just need to keep running even when the schema registry its ready
# otherwise we cause a rebalance and a new election. This should run until
# karapace is restarted
# if self._sc.ready():
# break
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this question its mainly for @jjaakola-aiven. I inherited the initial implementation from him. I think we shouldn't exit but I wait for him to reply here

@eliax1996 eliax1996 force-pushed the eliax1996/make-sure-master-wait-a-while-before-being-active-and-if-new-messages-are-arriving branch 3 times, most recently from 2385470 to 0e374a7 Compare November 19, 2024 13:19
@eliax1996 eliax1996 marked this pull request as ready for review November 19, 2024 13:19
@eliax1996 eliax1996 requested review from a team as code owners November 19, 2024 13:19
@eliax1996 eliax1996 force-pushed the eliax1996/make-sure-master-wait-a-while-before-being-active-and-if-new-messages-are-arriving branch 8 times, most recently from d8067fd to 5cb351b Compare November 21, 2024 10:27
@eliax1996 eliax1996 force-pushed the eliax1996/make-sure-master-wait-a-while-before-being-active-and-if-new-messages-are-arriving branch 2 times, most recently from 1da58a9 to f3fc5a3 Compare November 27, 2024 11:48
if msg_keymode == KeyMode.DEPRECATED_KARAPACE:
self.key_formatter.set_keymode(KeyMode.DEPRECATED_KARAPACE)
with self._ready_lock:
if not self._ready and self.key_formatter.get_keymode() == KeyMode.CANONICAL:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the changes done the key mode detection could be run during the normal operation. Not anymore only at startup.

src/karapace/coordinator/schema_coordinator.py Outdated Show resolved Hide resolved
)
await asyncio.sleep(0.5)

while not secondary.are_we_master():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will loop until test timeout. Maybe this could abort earlier if secondary is not set in reasonable time?

@eliax1996 eliax1996 force-pushed the eliax1996/make-sure-master-wait-a-while-before-being-active-and-if-new-messages-are-arriving branch from f3fc5a3 to 449a3ff Compare November 28, 2024 08:43
1. moving the coordinator in a separate thread
2. adding a waiting time between when the master its elected and the master can act. This has been done to avoid rapid elections of master that may produce schemas with different ids.

Example of what could happpen without the delay:

|--------------------------------------|
|Node | Node1    | Node2    | Node3    |
|Role | Master   | Follower | Follower |
|--------------------------------------|

Node1 -> Send Message A{id=max(current_ids)} to kafka

where the max(current_ids) = 10

---------------------------------------

Node1 its disconnected, the message its still in the producer queue of Node1

---------------------------------------

Node2 its elected master

|--------------------------------------|
|Node | Node1    | Node2    | Node3    |
|Role | Follower | Master   | Follower |
|--------------------------------------|

----------------------------------------

Node2 produces a message B{id=max(current_ids)} to kafka

Because the message A isn't yet delivered to Node2, the max(current_ids) returns still 10.
And we have an ID clash.

The solution its simple, each master should wait a reasonable high number of milliseconds before acting as a master.
So that all the in-flight messages are delivered to kafka + the reasonable delay of the consumer for the master node before noticing that a message has been produced
@eliax1996 eliax1996 force-pushed the eliax1996/make-sure-master-wait-a-while-before-being-active-and-if-new-messages-are-arriving branch from 449a3ff to 6509829 Compare November 28, 2024 08:57
Comment on lines +446 to +447
if self.key_formatter.get_keymode() == KeyMode.CANONICAL and msg_keymode == KeyMode.DEPRECATED_KARAPACE:
self.key_formatter.set_keymode(KeyMode.DEPRECATED_KARAPACE)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its better to keep the behaviour in that way, otherwise based on the restarts the cluster can behave in different ways. Lets make an example:

  1. Karapace starts, all the messages are CANONICAL
  2. Karapace become ready
  3. Someone produce a DEPRECATED record (its an illegal behaviour but its possible to have it in production). Now, since ready its True Karapace will produce always only CANONICAL records
  4. Karapace restarts, repeat the read
  5. Now it produces only DEPRECATED records

We have proved that this "state" its not based on the input but on the sequence of events and its not stable, a restart can change the behaviour of the node causing an extremely hard issue to detect a runtime.

The proper fix should be to keep the info about witch key mode to use in the message itself, so karapace will always select the right formatter once a record its updated/changed/deleted

@jjaakola-aiven jjaakola-aiven merged commit f43ff21 into main Nov 28, 2024
9 checks passed
@jjaakola-aiven jjaakola-aiven deleted the eliax1996/make-sure-master-wait-a-while-before-being-active-and-if-new-messages-are-arriving branch November 28, 2024 12:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants