-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cdc: add a description about checksum compatibility #19220
Open
3AceShowHand
wants to merge
14
commits into
pingcap:master
Choose a base branch
from
3AceShowHand:cdc-checksum-compatibility
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+22
−4
Open
Changes from all commits
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
01c658c
add a description about checksum compatibility
3AceShowHand 089dbca
add a description about checksum compatibility
3AceShowHand bf39f26
add a description about checksum compatibility
3AceShowHand 7944645
add a description about checksum compatibility
3AceShowHand 6268ece
add a description about checksum compatibility
3AceShowHand a9151a1
add a description about checksum compatibility
3AceShowHand 40c9172
add a description about checksum compatibility
3AceShowHand 680b633
fix the line break
3AceShowHand af45552
Update ticdc/ticdc-integrity-check.md
3AceShowHand 1e429c6
Update ticdc/ticdc-integrity-check.md
3AceShowHand 614fdc7
Update ticdc/ticdc-integrity-check.md
3AceShowHand ebd3e77
Update ticdc/ticdc-integrity-check.md
3AceShowHand 6ea30ae
Update ticdc/ticdc-integrity-check.md
3AceShowHand fea533b
Update ticdc/ticdc-integrity-check.md
3AceShowHand File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -61,19 +61,23 @@ TiCDC 默认关闭单行数据的 Checksum 校验功能。若要在开启此功 | |||||
|
||||||
## Checksum 算法 | ||||||
|
||||||
本节介绍 TiCDC 中 Checksum 算法的演进。不同的 Checksum 算法版本会影响 TiCDC 内部的 Checksum 校验过程,但不会影响下游 Kafka Consumer 对 Checksum 的校验规则。 | ||||||
|
||||||
### Checksum V1 | ||||||
|
||||||
在 v8.4.0 之前,TiDB 和 TiCDC 采用 Checksum v1 算法进行 Checksum 计算和校验。 | ||||||
在 v7.1.0 到 v8.2.0 及其之间的版本中,TiDB 和 TiCDC 采用 Checksum v1 算法进行 Checksum 计算和校验。 | ||||||
|
||||||
在启用单行数据 Checksum 正确性校验功能后,TiDB 会使用 CRC32 算法计算每行数据的 Checksum 值,并将这个值与该行数据一并存储在 TiKV 中。随后,TiCDC 从 TiKV 读取这些数据,并使用相同的算法重新计算 Checksum,如果得到的 Checksum 值与 TiDB 写入的 Checksum 值相同,则表明数据在从 TiDB 到 TiCDC 的传输过程中是正确的。 | ||||||
|
||||||
TiCDC 将数据编码成特定格式并发送至 Kafka。Kafka Consumer 读取数据后,可以使用与 TiDB 相同的 CRC32 算法计算得到新的 Checksum,将此值与数据中携带的 Checksum 值进行比较,若二者一致,则表明从 TiCDC 到 Kafka Consumer 的传输链路上的数据是正确的。 | ||||||
|
||||||
### Checksum V2 | ||||||
|
||||||
从 v8.4.0 开始,TiDB 和 TiCDC 引入 Checksum V2 算法,解决了 Checksum V1 在执行 `ADD COLUMN` 或 `DROP COLUMN` 后无法正确校验 Update 或 Delete 事件中 Old Value 数据的问题。 | ||||||
在 v8.3.0 中,TiDB 和 TiCDC 使用 Checksum V2 算法,解决了 Checksum V1 在执行 `ADD COLUMN` 或 `DROP COLUMN` 后无法正确校验 Update 或 Delete 事件中 Old Value 数据的问题。Checksum V2 算法基于 Key-Value 对计算字节级别的 Checksum 值,其中 Key 由表 ID 和行 ID 组成。 | ||||||
|
||||||
### Checksum V3 | ||||||
|
||||||
对于 v8.4.0 及之后新创建的集群,或从之前版本升级到 v8.4.0 的集群,启用单行数据 Checksum 正确性校验功能后,TiDB 默认使用 Checksum V2 算法进行 Checksum 计算和校验。TiCDC 支持同时处理 V1 和 V2 两种 Checksum。该变更仅影响 TiDB 和 TiCDC 内部实现,不影响下游 Kafka consumer 的 Checksum 计算校验方法。 | ||||||
从 v8.4.0 开始,TiDB 和 TiCDC 使用 Checksum V3 算法。该算法解决了 Checksum V2 算法在 [BR](/br/backup-and-restore-overview.md) 恢复场景下,由于 Table ID 改写导致 Old Value Checksum 校验失败的问题。Checksum V3 算法基于表 ID 和 Value 部分计算字节级别的 Checksum 值。 | ||||||
|
||||||
## Checksum 计算规则 | ||||||
|
||||||
|
@@ -109,4 +113,18 @@ fn checksum(columns) { | |||||
> **注意:** | ||||||
> | ||||||
> - 开启 Checksum 校验功能后,DECIMAL 和 UNSIGNED BIGINT 类型的数据会被转换为字符串类型。因此在下游消费者代码中需要将其转换为对应的数值类型,然后进行 Checksum 相关计算。 | ||||||
> - Delete 事件只含有 Handle Key 列的内容,而 Checksum 是基于所有列计算的,所以 Delete 事件不参与到 Checksum 的校验中。 | ||||||
> - Delete 事件只含有 Handle Key 列的内容,而 Checksum 是基于所有列计算的,所以 Delete 事件不参与到 Checksum 的校验中。 | ||||||
|
||||||
## 兼容性问题 | ||||||
|
||||||
### 升级场景兼容性 | ||||||
|
||||||
升级集群时,需要先升级 TiCDC,后升级 TiDB。升级过程中,TiCDC 处于高版本,TiDB 处于低版本时,支持处理由低版本 TiDB 写入的 Checksum。升级完成后,应该保证 TiDB 和 TiCDC 使用相同的版本的 Checksum 校验算法。 | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
### BR 恢复场景兼容性 | ||||||
|
||||||
在 v8.3.0 和 v8.4.0 中,Checksum 功能存在以下兼容性问题: | ||||||
|
||||||
当使用 BR 工具备份 v8.3.0 数据并恢复到 v8.3.0 或之后版本的 TiDB 集群时,如果在 Changefeed 同步过程中遇到 Update 或 Delete 事件,TiCDC 可能会在校验 Old Value 时失败。原因是在 BR 恢复数据时,如果恢复的表的 ID 已在目标集群中被占用,BR 会重新分配表 ID,但 Checksum 值不会更新。这导致 TiCDC 在校验数据时,使用的表 ID 与数据在源集群中写入时的表 ID 不一致,最终导致校验失败。 | ||||||
|
||||||
如果遇到此问题,建议[关闭 Changefeed 的 Checksum 校验功能](#关闭功能)。 |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
用户如何查看 TiDB 和 TiCDC 的 Checksum 校验算法版本,如果有对应的文档请添加对应链接
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
用户不需要关心 checksum 校验算法的版本,这是 tidb 和 ticdc 内部校验机制。具体对应的版本,在文中有提及。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@3AceShowHand 如果用户不需要关心,那这里“应该保证 TiDB 和 TiCDC 使用相同的版本的 Checksum 校验算法”主语是谁呢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
用户不关心实现细节,只需要关心版本号。主语是用户。