-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(Bug report) Upgrade caused minor corruption in some Chinese character note titles #4412
Comments
That's strange ... in the 0.61 upgrade, there was quite a lot of data migration, but specifically note titles were not touched at all. The only thing I can think of is the better-sqlite3 upgrade (containing sqlite upgrade) and electron upgrade, perhaps some Unicode support change / bug. |
After numerous rounds of testing—literally hundreds—I have pinpointed the root cause and successfully resolved the issue. In summary, the culprit was the automatic encoding conversion, which was causing the problem. |
I utilize mitmproxy to monitor HTTP requests for data initialization with an existing server. The data flow is structured as follows: server -> mitmproxy -> client. Notably, incorporating mitmproxy has significantly reduced the occurrence of corrupted notes during initialization from 30-40 to 3. Additionally, the random corruption observed across all notes has become consistent for these three notes. Upon conducting debugging, I discovered that the However, the data remains intact within the mitm proxy, suggesting that the server sends correct data, but the client somehow fails to receive it accurately. Further debugging led me to identify a suspicious code section: In this context, the string concatenation in JavaScript performs implicit data conversion in Node.js. Notably, each Chinese character occupies 3 bytes under UTF-8. If a Chinese character straddles the boundary of two chunks, it may become split into two parts. Unfortunately, the split data is unrecognizable for UTF-8 decoding, resulting in the appearance of "�". This issue can be reproduced by manually decoding the chunk, confirming that it indeed ends in the same corrupted strings. I've made a fix here #4522. |
Hats off, awesome detective work! |
Trilium Version
0.61.13
What operating system are you using?
Windows
What is your setup?
Local + server sync
Operating System Version
win10
Description
Some user tells me that he got some note title broken after upgrade. I checked my own notes and found that a handful of note titles were slightly corrupted, about 30 out of 15,000.
During my investigation, I came across the peculiar character
�
within these titles.Notably, all the affected titles contained Chinese characters, although the original characters varied.
Error logs
No response
The text was updated successfully, but these errors were encountered: