fix: duplicated schemas on rapid elections while continuous produce of records #938

eliax1996 · 2024-08-21T16:05:47Z

coordinator rewrite

moving the coordinator in a separate thread
adding a waiting time between when the master its elected and the master can act. This has been done to avoid rapid elections of master that may produce schemas with different ids.

Example of what could happpen without the delay:

|--------------------------------------|
|Node | Node1    | Node2    | Node3    |
|Role | Master   | Follower | Follower |
|--------------------------------------|


Node1 -> Send Message A{id=max(current_ids)} to kafka

where the max(current_ids) = 10

---------------------------------------

Node1 its disconnected, the message its still in the producer queue of Node1

---------------------------------------

Node2 its elected master

|--------------------------------------|
|Node | Node1    | Node2    | Node3    |
|Role | Follower | Master   | Follower |
|--------------------------------------|

----------------------------------------


Node2 produces a message B{id=max(current_ids)} to kafka

Because the message A isn't yet delivered to Node2, the max(current_ids) returns still 10.
And we have an ID clash.

The solution its simple, each master should wait a reasonable high number of milliseconds before acting as a master.
So that all the in-flight messages are delivered to kafka + the reasonable delay of the consumer for the master node before noticing that a message has been produced

eliax1996 · 2024-08-22T16:54:52Z

github-actions · 2024-08-23T10:49:43Z

Coverage report

Click to see where and how coverage changed

File	Statements	Missing	Coverage	Coverage (new stmts)	Lines missing
src/karapace
config.py
schema_reader.py					377-378
schema_registry.py					100
schema_registry_apis.py					106-107, 687-710, 1352
typing.py					111, 115
src/karapace/coordinator
master_coordinator.py					104, 106-107, 163, 179
schema_coordinator.py					198, 213-214, 499
Project Total

_{This report was generated by python-coverage-comment-action}

jclarysse · 2024-08-26T07:28:29Z

Please also add the new config waiting_time_before_acting_as_master_ms to https://github.com/Aiven-Open/karapace/blob/main/README.rst#configuration-keys

karapace/coordinator/master_coordinator.py

karapace/coordinator/schema_coordinator.py

karapace/schema_reader.py

eliax1996 · 2024-11-13T15:54:00Z

src/karapace/schema_registry_apis.py

@@ -1307,7 +1335,7 @@ async def _forward_request_remote(
        if auth_header is not None:
            headers["Authorization"] = auth_header

-        with async_timeout.timeout(timeout):
+        async with async_timeout.timeout(timeout):


this was wrong since a while

eliax1996 · 2024-11-13T17:59:26Z

src/karapace/coordinator/schema_coordinator.py

+        LOG.info("Resetting generation status")
+        # this is called immediately after the election, we shouldn't reset this
+        # until a new node its elected aka the other path where a new node its elected
+        # otherwise this its called at each round and we keep not counting the 5 seconds
+        # required before the election.
+        # self._are_we_master = False
        self.generation = OffsetCommitRequest.DEFAULT_GENERATION_ID


this was called because we were exiting the election loop in the _async_loop of the master_coordinator.py. We must keep running the thread/algorithm otherwise we are always electing a new node since that causes a rebalance and the rebalance causes a new election (the rebalance happen because we sent the reset_generation as a side effect of closing the heartbeat task)

eliax1996 · 2024-11-14T11:21:04Z

src/karapace/coordinator/master_coordinator.py

+            # why do we need to close?
+            # we just need to keep running even when the schema registry its ready
+            # otherwise we cause a rebalance and a new election. This should run until
+            # karapace is restarted
+            # if self._sc.ready():
+            #    break


this question its mainly for @jjaakola-aiven. I inherited the initial implementation from him. I think we shouldn't exit but I wait for him to reply here

src/karapace/schema_registry_apis.py

jjaakola-aiven · 2024-11-28T08:03:41Z

src/karapace/schema_reader.py

-                if msg_keymode == KeyMode.DEPRECATED_KARAPACE:
-                    self.key_formatter.set_keymode(KeyMode.DEPRECATED_KARAPACE)
+            with self._ready_lock:
+                if not self._ready and self.key_formatter.get_keymode() == KeyMode.CANONICAL:


With the changes done the key mode detection could be run during the normal operation. Not anymore only at startup.

src/karapace/coordinator/schema_coordinator.py

jjaakola-aiven · 2024-11-28T08:09:29Z

tests/integration/test_schema_coordinator.py

+        )
+        await asyncio.sleep(0.5)
+
+    while not secondary.are_we_master():


This will loop until test timeout. Maybe this could abort earlier if secondary is not set in reasonable time?

1. moving the coordinator in a separate thread 2. adding a waiting time between when the master its elected and the master can act. This has been done to avoid rapid elections of master that may produce schemas with different ids. Example of what could happpen without the delay: |--------------------------------------| |Node | Node1 | Node2 | Node3 | |Role | Master | Follower | Follower | |--------------------------------------| Node1 -> Send Message A{id=max(current_ids)} to kafka where the max(current_ids) = 10 --------------------------------------- Node1 its disconnected, the message its still in the producer queue of Node1 --------------------------------------- Node2 its elected master |--------------------------------------| |Node | Node1 | Node2 | Node3 | |Role | Follower | Master | Follower | |--------------------------------------| ---------------------------------------- Node2 produces a message B{id=max(current_ids)} to kafka Because the message A isn't yet delivered to Node2, the max(current_ids) returns still 10. And we have an ID clash. The solution its simple, each master should wait a reasonable high number of milliseconds before acting as a master. So that all the in-flight messages are delivered to kafka + the reasonable delay of the consumer for the master node before noticing that a message has been produced

eliax1996 · 2024-11-28T12:25:53Z

src/karapace/schema_reader.py

+            if self.key_formatter.get_keymode() == KeyMode.CANONICAL and msg_keymode == KeyMode.DEPRECATED_KARAPACE:
+                self.key_formatter.set_keymode(KeyMode.DEPRECATED_KARAPACE)


I think its better to keep the behaviour in that way, otherwise based on the restarts the cluster can behave in different ways. Lets make an example:

Karapace starts, all the messages are CANONICAL

Karapace become ready

Someone produce a DEPRECATED record (its an illegal behaviour but its possible to have it in production). Now, since ready its True Karapace will produce always only CANONICAL records

Karapace restarts, repeat the read

Now it produces only DEPRECATED records

We have proved that this "state" its not based on the input but on the sequence of events and its not stable, a restart can change the behaviour of the node causing an extremely hard issue to detect a runtime.

The proper fix should be to keep the info about witch key mode to use in the message itself, so karapace will always select the right formatter once a record its updated/changed/deleted

eliax1996 force-pushed the eliax1996/make-sure-master-wait-a-while-before-being-active-and-if-new-messages-are-arriving branch from 7b04f19 to 285c950 Compare August 22, 2024 16:25

eliax1996 changed the title ~~WIP: fix duplicated schemas on rapid elections while continuous produce of records~~ fix: duplicated schemas on rapid elections while continuous produce of records Aug 22, 2024

eliax1996 force-pushed the eliax1996/make-sure-master-wait-a-while-before-being-active-and-if-new-messages-are-arriving branch 2 times, most recently from 52fb34c to 054efb7 Compare August 22, 2024 16:50

eliax1996 force-pushed the eliax1996/make-sure-master-wait-a-while-before-being-active-and-if-new-messages-are-arriving branch 5 times, most recently from 1fa1052 to 7f657bd Compare August 23, 2024 10:14

eliax1996 force-pushed the eliax1996/make-sure-master-wait-a-while-before-being-active-and-if-new-messages-are-arriving branch from 49a9fd2 to c4e58e8 Compare August 23, 2024 14:18

eliax1996 force-pushed the eliax1996/make-sure-master-wait-a-while-before-being-active-and-if-new-messages-are-arriving branch 6 times, most recently from 8183430 to 784a2fa Compare August 29, 2024 13:34

nosahama reviewed Sep 10, 2024

View reviewed changes

karapace/coordinator/master_coordinator.py Outdated Show resolved Hide resolved

nosahama reviewed Sep 10, 2024

View reviewed changes

karapace/coordinator/schema_coordinator.py Outdated Show resolved Hide resolved

nosahama reviewed Sep 10, 2024

View reviewed changes

karapace/schema_reader.py Outdated Show resolved Hide resolved

eliax1996 force-pushed the eliax1996/make-sure-master-wait-a-while-before-being-active-and-if-new-messages-are-arriving branch from 784a2fa to 3f84dff Compare September 18, 2024 14:15

eliax1996 force-pushed the eliax1996/make-sure-master-wait-a-while-before-being-active-and-if-new-messages-are-arriving branch 7 times, most recently from 9f3d8bc to 25e6c16 Compare November 13, 2024 15:56

eliax1996 commented Nov 13, 2024

View reviewed changes

eliax1996 force-pushed the eliax1996/make-sure-master-wait-a-while-before-being-active-and-if-new-messages-are-arriving branch 2 times, most recently from a422f24 to d7cdd8a Compare November 14, 2024 07:39

eliax1996 commented Nov 14, 2024

View reviewed changes

eliax1996 force-pushed the eliax1996/make-sure-master-wait-a-while-before-being-active-and-if-new-messages-are-arriving branch 3 times, most recently from 2385470 to 0e374a7 Compare November 19, 2024 13:19

eliax1996 marked this pull request as ready for review November 19, 2024 13:19

eliax1996 requested review from a team as code owners November 19, 2024 13:19

eliax1996 force-pushed the eliax1996/make-sure-master-wait-a-while-before-being-active-and-if-new-messages-are-arriving branch 8 times, most recently from d8067fd to 5cb351b Compare November 21, 2024 10:27

keejon reviewed Nov 22, 2024

View reviewed changes

src/karapace/schema_registry_apis.py Outdated Show resolved Hide resolved

eliax1996 force-pushed the eliax1996/make-sure-master-wait-a-while-before-being-active-and-if-new-messages-are-arriving branch 2 times, most recently from 1da58a9 to f3fc5a3 Compare November 27, 2024 11:48

jjaakola-aiven reviewed Nov 28, 2024

View reviewed changes

eliax1996 force-pushed the eliax1996/make-sure-master-wait-a-while-before-being-active-and-if-new-messages-are-arriving branch from f3fc5a3 to 449a3ff Compare November 28, 2024 08:43

eliax1996 force-pushed the eliax1996/make-sure-master-wait-a-while-before-being-active-and-if-new-messages-are-arriving branch from 449a3ff to 6509829 Compare November 28, 2024 08:57

eliax1996 commented Nov 28, 2024

View reviewed changes

jjaakola-aiven approved these changes Nov 28, 2024

View reviewed changes

jjaakola-aiven merged commit f43ff21 into main Nov 28, 2024
9 checks passed

jjaakola-aiven deleted the eliax1996/make-sure-master-wait-a-while-before-being-active-and-if-new-messages-are-arriving branch November 28, 2024 12:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: duplicated schemas on rapid elections while continuous produce of records #938

fix: duplicated schemas on rapid elections while continuous produce of records #938

eliax1996 commented Aug 21, 2024 •

edited

Loading

eliax1996 commented Aug 22, 2024

github-actions bot commented Aug 23, 2024 •

edited

Loading

jclarysse commented Aug 26, 2024

eliax1996 Nov 13, 2024

eliax1996 Nov 13, 2024

eliax1996 Nov 14, 2024

jjaakola-aiven Nov 28, 2024

jjaakola-aiven Nov 28, 2024

eliax1996 Nov 28, 2024

		if self.key_formatter.get_keymode() == KeyMode.CANONICAL and msg_keymode == KeyMode.DEPRECATED_KARAPACE:
		self.key_formatter.set_keymode(KeyMode.DEPRECATED_KARAPACE)

fix: duplicated schemas on rapid elections while continuous produce of records #938

fix: duplicated schemas on rapid elections while continuous produce of records #938

Conversation

eliax1996 commented Aug 21, 2024 • edited Loading

eliax1996 commented Aug 22, 2024

github-actions bot commented Aug 23, 2024 • edited Loading

Coverage report

jclarysse commented Aug 26, 2024

eliax1996 Nov 13, 2024

Choose a reason for hiding this comment

eliax1996 Nov 13, 2024

Choose a reason for hiding this comment

eliax1996 Nov 14, 2024

Choose a reason for hiding this comment

jjaakola-aiven Nov 28, 2024

Choose a reason for hiding this comment

jjaakola-aiven Nov 28, 2024

Choose a reason for hiding this comment

eliax1996 Nov 28, 2024

Choose a reason for hiding this comment

eliax1996 commented Aug 21, 2024 •

edited

Loading

github-actions bot commented Aug 23, 2024 •

edited

Loading