-
-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving CJK Characters Support #186
Comments
Hi, thanks for the translations. Just in case, there's been a previous translation that could possibly be used as a reference. Regarding space after sentences, in general, I encourage translators to apply their best judgement. In this case, Crowdin separates translations by sentences, so I think it's autommatically adding the spaces after periods and/or commas. I don't think it will allow you to fix that issue by yourself, so I'll try to see if there's an option somewhere that fixes it. Regarding spaces between CJK characters and Latin characters, I've noticed the previous NES translations uses characters that include padding (i.e. |
Thanks for your reply. I've seen the previous translation. I am not going to change anything on that page because I'm not familiar with NES at all. But I will try to be consistent with its conventions.
Unfortunately, that's another issue. The usage of different brackets ( Adding spaces is what the NES translations actually did. e.g., The best solution I've thought of is to import js files to Chinese-translated markdown files only, and there are alternatives too: a script that processes text (if it's possible), or a Crowdin app/plugin (I don't know much about it yet). |
Hmm, now that I think of, when the website is generated, the Markdown is converted into HTML, but in-between the conversion I can add regex calls. So, while I experiment with this in the build scripts, could you try to make the translation without using the extra spaces? Hopefully this will work and I'll be able to port it to other CJK scripts. Thanks! |
I prefer workarounds without JS running in browsers too. I realized that your articles will be published not only on the website but also on EPUB using pandoc. In fact, there's no need to insert spaces in EPUB, because most of e-book readers can take care of the padding. On the other hand, spaces after sentences still need to be dropped. Trim off spaces after sentences seem simple to me: just remove all the spaces (and line-feeds) after Now please let me introduce the regex rules for adding HTML tags with JS. The codes below are part of my hexo plugin. Please feel free to use or modify them, and I hope I can explain them clearly. // Pattern rules taken from text-autospace.js
const hanzi = '[\u2E80-\u2FFF\u31C0-\u31EF\u3300-\u4DBF\u4E00-\u9FFF\uF900-\uFAFF\uFE30-\uFE4F]',
punc = {
base: "[@&=_\\$%\\^\\*-\\+/]",
open: "[\\(\\[\\{<‘“]",
close: "[,\\.\\?!:\\)\\]\\}>’”]"
},
latin = '[A-Za-z0-9\u00C0-\u00FF\u0100-\u017F\u0180-\u024F\u1E00-\u1EFF]' + '|' + punc.base,
patterns = [
RegExp('(' + hanzi + ')(' + latin + '|' + punc.open + ')', 'gi'),
RegExp('(' + latin + '|' + punc.close + ')(' + hanzi + ')', 'gi')
]; Here are the explanations of each variable:
Assume that tags named html hl:after {
content: ' ';
display: inline;
font-family: inherit;
font-size: 0.8em;
}
html code hl,
html pre hl,
html kbd hl,
html samp hl,
html ruby hl,
html .tag-list-item hl {
display: none;
}
html ol > hl,
html ul > hl {
display: none;
} Don't worry if any customized tag is placed in the wrong place, we still have a chance to decide whether to show them or not with CSS.
I'm glad to do so and see if this helps other CJK translations, though there are Chinese translations only at present :-). |
That's a great breakdown of the script and it will help me to port the regular expressions. Let me know when you get the chinese translation ready and I'll test the regex. Many thanks! |
I've just finished translating the Nintendo DS article (no extra spaces). Please handle it at a time that you deem appropriate. |
Great, I've deployed it here for testing (it doesn't have the I'm checking the regex effects on the Markdown article, and there seems to be the following bugs:
is replaced like this:
The regex is applied on Markdown, so I think that's creating some confusion on the rules (I'm assuming it was originally made for HTML?). I guess I just need to tweak the regex. But overall, this is very good progress and I really appreciate there's a new article available in Chinese. I'll try to find the causes of the regex problems meanwhile. Thanks. |
Good news! Good to see my translation deployed. I may translate the GBA article later.
Okay, I will go through and check the spaces on Crowdin.
You are right, it was originally made for HTML. I might give up writing markdown rules if I were you since markdown is very flexible so the regex may be too complex and loses readability. I suggest applying replacements to HTML files. Please tell me if you need it and I will modify my hexo plugin to handle HTML files as an executable. NodeJS executables run slowly, but a few seconds per file sounds tolerable to me.
You are welcome. Please let me know if there's anything I can do to help. P.S. I found some more characters with spaces after them to be trimmed off. The whole list is: |
Sound good! By the way, don't forget to sign your name or username here so I can credit you for the translation |
Oops, And, I've finished my spaces checking on Crowdin. ✌️ |
Thanks! In my case I've been trying to learn more about how to improve the styling and layout for Chinese-speaking audiences (using simplified chinese scripts, in this case). I've recently changed the following (only visible in the chinese articles):
From your perspective, do you think they improve the reading experience for Chinese readers? |
Wow! They do help a lot! The font families cover the default fonts of most devices. It looks pretty good with text justified and indented. |
Glad it helped! I think it will take me some time to get the regex rules to properly parse latin text. However, I'm glad that I can improve the reading through css as well. |
I am working on translating the
Nintendo DS
article into Chinese.There are 2 minor issues about CJK characters (Chinese, Japanese, and Korean) that I want to ask about.
Space after sentences
The commas and periods in Chinese are as wide as two Latin characters, the same as all the other Chinese characters. Therefore, we do not add spaces after periods and commas. Avoiding unnecessary spaces is easy when writing with markdown: do not append a new line or add a space after each sentence.
I would like to know if it is convenient to remove these spaces after sentences in the Chinses translated markdown file. I'm not sure if this breaks the translation workflow of Crowdin.
(It does not matter that much, so don't worry if not possible.)
Spaces between CJK characters and Latin characters
It is suggested to add "padding" between CJK characters and Latin characters. The simplest and best way to do this is to import a js file. I would like to know if it is convenient to import it into your project. If not, as a compromise, I will manually add spaces when translating.
The text was updated successfully, but these errors were encountered: