Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Bug report) Upgrade caused minor corruption in some Chinese character note titles #4412

Closed
Nriver opened this issue Nov 8, 2023 · 4 comments

Comments

@Nriver
Copy link
Contributor

Nriver commented Nov 8, 2023

Trilium Version

0.61.13

What operating system are you using?

Windows

What is your setup?

Local + server sync

Operating System Version

win10

Description

Some user tells me that he got some note title broken after upgrade. I checked my own notes and found that a handful of note titles were slightly corrupted, about 30 out of 15,000.

During my investigation, I came across the peculiar character within these titles.

ksnip_20231108-170321

Notably, all the affected titles contained Chinese characters, although the original characters varied.

Error logs

No response

@zadam
Copy link
Owner

zadam commented Nov 12, 2023

That's strange ... in the 0.61 upgrade, there was quite a lot of data migration, but specifically note titles were not touched at all.

The only thing I can think of is the better-sqlite3 upgrade (containing sqlite upgrade) and electron upgrade, perhaps some Unicode support change / bug.

@Nriver
Copy link
Contributor Author

Nriver commented Dec 14, 2023

After numerous rounds of testing—literally hundreds—I have pinpointed the root cause and successfully resolved the issue. In summary, the culprit was the automatic encoding conversion, which was causing the problem.

@Nriver
Copy link
Contributor Author

Nriver commented Dec 14, 2023

I utilize mitmproxy to monitor HTTP requests for data initialization with an existing server.

The data flow is structured as follows: server -> mitmproxy -> client. Notably, incorporating mitmproxy has significantly reduced the occurrence of corrupted notes during initialization from 30-40 to 3. Additionally, the random corruption observed across all notes has become consistent for these three notes.

Upon conducting debugging, I discovered that the entity_changes component is receiving corrupted data.

ksnip_20231213-173505

However, the data remains intact within the mitm proxy, suggesting that the server sends correct data, but the client somehow fails to receive it accurately.

ksnip_20231213-174438

Further debugging led me to identify a suspicious code section:

ksnip_20231214-094259

In this context, the string concatenation in JavaScript performs implicit data conversion in Node.js. Notably, each Chinese character occupies 3 bytes under UTF-8. If a Chinese character straddles the boundary of two chunks, it may become split into two parts. Unfortunately, the split data is unrecognizable for UTF-8 decoding, resulting in the appearance of "�".

This issue can be reproduced by manually decoding the chunk, confirming that it indeed ends in the same corrupted strings.

ksnip_20231214-105907

I've made a fix here #4522.

@zadam
Copy link
Owner

zadam commented Dec 27, 2023

Hats off, awesome detective work!

@Nriver Nriver closed this as completed Jan 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants