Skip to content

Commit

Permalink
Merge pull request #36 from XiaoMi/develop
Browse files Browse the repository at this point in the history
中文标点Bug修复
  • Loading branch information
guoyuankai authored Mar 16, 2021
2 parents 240b13d + 94eaaf5 commit 0721ed7
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 2 deletions.
2 changes: 1 addition & 1 deletion minlp-tokenizer/minlptokenizer/tokenizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ def format_string(ustring):
inside_code = ord(uchar)
if inside_code == 12288: # 全角空格直接转换
inside_code = 32
elif 65281 <= inside_code <= 65374: # 全角字符(除空格)转化
elif 65296 <= inside_code <= 65305 or 65313 <= inside_code <= 65339: # 全角字符(除空格和英文标点)转化
inside_code -= 65248
half_wide_string += chr(inside_code)

Expand Down
2 changes: 1 addition & 1 deletion minlp-tokenizer/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@

setup(
name='minlp-tokenizer',
version='3.3.0',
version='3.3.1',
description='MiNLP-Tokenizer中文分词工具',
author='Yuankai Guo, Liang Shi, Yupeng Chen',
author_email='[email protected], [email protected]',
Expand Down

0 comments on commit 0721ed7

Please sign in to comment.