-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Duplicate data when sync data from milvus upstream to downstream #145
Comments
What version of milvus is used? What version of cdc is used? Is there high concurrent insert/delete? |
@SimFG I use milvus version 2.4.13, cdc version v2.0.0-rc2. TPS is 2
|
I just tested again only add data, no delete data and I see, |
@anhnch30820 you can try to use the cdc server in the latest main branch. |
@SimFG I tried using cdc server in lastest main branch and I got error when I create task
|
It seems that this has correctly processed the create request |
@SimFG But when I created collection, nothing changes in the target cluster
|
From the log, the collection in source milvus has not been created yet, because its state is creating. However, I suspect that this problem is caused by the previous data residue. To ensure correctness, I suggest cleaning up all environmental data first, such as the meta storage information of cdc, and then redeploy the two milvus and cdc services. |
@SimFG I tried again, it still duplicated |
How do you do it? Is it the following steps: insert data first, then delete data, and then use attu to check the number of data rows. Do you wait for a while before checking the number of rows? It may be because the deleted data may not have been applied yet. If you don't want to wait for a while, you can try using flush. Can you find out the diff data and whether some delete operations have not taken effect. |
Each PR is guaranteed by integration testing, and there will be CDC process testing every day. In theory, such a small amount of data should be unlikely to go wrong. |
@SimFG Here is the upstream |
@SimFG Could you provide me the latest file bin milvus-cdc? |
you can clone the repo, and in the repo dir, execute the command: |
Can you confirm whether the two milvus are completely independent? I feel that the downstream milvus seems to be abnormal. The extra data seems to be the data of one segment being repeatedly calculated on another segment. 318 = 169+149 |
@SimFG |
@anhnch30820 See if the point is not set correctly. You can try not to use the point first to see if the cdc can work properly. |
@SimFG Not work with large data |
This test is to see if the position parameter is passed in when creating the task. In addition, the performance of attu seems to be caused by duplicate data. Recently, I am developing a data difference checking tool. |
Current Behavior
Expected Behavior
No response
Steps To Reproduce
No response
Environment
No response
Anything else?
No response
The text was updated successfully, but these errors were encountered: