Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove controls unicode #77

Closed
wants to merge 2 commits into from

Conversation

guanbo
Copy link

@guanbo guanbo commented Aug 19, 2015

@guanbo guanbo force-pushed the remove-controls-unicode branch from 533fdbc to beb9dc7 Compare August 19, 2015 16:58
@dead-horse
Copy link
Member

rfc 里面写的:

All Unicode characters may be placed within the
quotation marks except for the characters that must be escaped:
quotation mark, reverse solidus, and the control characters (U+0000
through U+001F).

我理解如果要在 json 格式里面传 control characters 需要 escape,如果未转义确实是应该报错。这个地方是不是 wechat 请求拿到的响应自身是有问题的?

@fengmk2
Copy link
Member

fengmk2 commented Aug 19, 2015

按 MDN 里面的 Polyfill 实现,确实正确的做法是需要对控制字符 /[\\"\u0000-\u001F\u2028\u2029]/ 进行重新编码的。
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON#Polyfill

@@ -671,7 +671,7 @@ function parseJSON(data) {
data: null
};
try {
result.data = JSON.parse(data);
result.data = JSON.parse(data.replace(/[\u0000-\u001f\u0080-\u009f]/g, ""));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

加个参数才替换吧,这样你的场景不可能让微信改的,也能实现。

function parseJSON(data, trimControlCharacters) {
  // ...
  if (trimControlCharacters) {
    data = data.replace(/[\u0000-\u001f\u0080-\u009f]/g, "");
  }
  // json parse data
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

完整的实现应该不是替换成空白符,而是替换成正确的 encode 字符。

var escMap = {
  '"': '\\"',       // \u0022
  '\\': '\\\\',     // \u005c
  '\b': '\\b',    // \u0008 
  '\f': '\\f',      // \u000c
  '\n': '\\n',    // \u000a
  '\r': '\\r',      // \u000d
  '\t': '\\t'       // \u0009
};
var escFunc = function (m) { 
  return escMap[m] || '\\u' + (m.charCodeAt(0) + 0x10000).toString(16).substr(1); 
};
var escRE = /[\u0000-\u001F\u005C]/g;
data = data.replace(escRE, escFunc);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我测试了一下,只有 \u0000-\u001F 会出问题, \u0080-\u009f 不会有问题的。

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

呃。。。JSON.parse('{"foo": "bar \u005c"}') 会有问题

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var escRE = /[\u0000-\u001F\u005C]/g;

@fengmk2

  • \u005C 不能替换,否则会将合法JSON String异化。Polyfill给出的是stringify的实现与parse是不同的。
  • escFunc会导致case digest auth success in httpbin过不了。
  • 数据测试后发现C0控制(0000-001F)符还是用空字符替换比较稳妥,虽然有部分失真。

@dead-horse
Copy link
Member

我觉得不太应该在 urllib 来做兼容,如果确实要兼容服务端返回的非法 json,应该在应用层面做:

var res = yield urllib.request(url)
var data = res.data.toString()
data.replace(/xx/g, '')
data = JSON.parse(data)

@guanbo
Copy link
Author

guanbo commented Aug 20, 2015

我觉得不太应该在 urllib 来做兼容,如果确实要兼容服务端返回的非法 json,应该在应用层面做:

确实在urllib做有搞脏的嫌疑,但世界本身不完美。多一些包容可以获取更多的使用场景。

@fengmk2
Copy link
Member

fengmk2 commented Aug 20, 2015

@guanbo 我加个参数,如果开启了此参数,就自动去除控制符吧。

@guanbo
Copy link
Author

guanbo commented Aug 20, 2015

我加个参数,如果开启了此参数,就自动去除控制符吧。

  • OK, 你改吧。这个pr先关了。
  • 改完这边,顺便把node-web/wechat-api那边也上参数吧。

@LaPoPvLe
Copy link

我想说, 应该只解决了部分, 还是会出现无法正常解析的情况.
不太明白原因是啥, 我就把错误的放出来, 大神么看看

"nickname":"⚈้̤͡ ˌ̫̮ ⚈้̤͡" "," sex":2,
"language":"en",

也看了下unicode应该是不在那个区间的, options.fixJSONCtlChars开启下, 还是会报错.

@fengmk2
Copy link
Member

fengmk2 commented Aug 24, 2015

将 buffer 也贴出来吧,拿到二进制数据会比较好调试。

@LaPoPvLe
Copy link

我单发你文件吧?

@fengmk2
Copy link
Member

fengmk2 commented Aug 24, 2015

可以,发我邮箱吧

@LaPoPvLe
Copy link

回复邮件说地址错误...

@LaPoPvLe
Copy link

有收到么

@fengmk2
Copy link
Member

fengmk2 commented Aug 24, 2015

@LaPoPvLe 呃。。。没有,发到 [email protected]

@LaPoPvLe
Copy link

已发

@fengmk2
Copy link
Member

fengmk2 commented Aug 24, 2015

还是没有收到。。。难道在垃圾邮箱里面?我看看

@airyland
Copy link

wechat-api还是有这个报错,现在如何解决?

@justrustc
Copy link

我加了还是会遇到这个问题
wechat-api:1.2.3
urllib:2.6.0

@hengkx
Copy link

hengkx commented Apr 6, 2016

wechat-api:1.27.1
urllib:2.8.0

还是有问题

@glonely
Copy link

glonely commented Apr 27, 2016

@fengmk2 麻烦看下这个.
value中有双引号时,设置 fixJSONCtlChars 会解析错误
<Buffer 7b 22 73 75 62 73 63 72 69 62 65 22 3a 31 2c 22 6f 70 65 6e 69 64 22 3a 22 6f 66 78 6a 51 76 37 78 65 51 6c 39 4d 38 31 67 59 4c 6d 48 52 6d 79 4b 5f ... > '{"subscribe":1,"openid":"ofxjQv7xeQl9M81gYLmHRmyK_BS8","nickname":"[“ 芯安 " ]","sex":1,"language":"zh_CN","city":"Xuchang","province":"Henan","country":"China","headimgurl":"http://wx.qlogo.cn/mmopen/x6MqrqVWMJ14QaRhaWNw9YvucuiaLVrouhAg7mdKkE8YtuclVVgubIxdEwibnMvd5UY0fOPUBrk7OnR095s06onukXfUkcYPSY/0","subscribe_time":1461675398,"remark":"","groupid":0}'

@fengmk2
Copy link
Member

fengmk2 commented Apr 27, 2016

@glonely 如何重现呢?

@glonely
Copy link

glonely commented May 3, 2016

@fengmk2
我上面给出的JSON就是解析出错的内容,直接用它跑下就能重现..
另外,我有发微信返回的原始buffer在你邮箱.
麻烦抽空看下,谢谢。

@fengmk2
Copy link
Member

fengmk2 commented May 3, 2016

@glonely 你给的 buffer 是截断的,能否给完整的。。。

image

@glonely
Copy link

glonely commented May 3, 2016

image
对于双引号,我本地是临时这样处理.

@glonely
Copy link

glonely commented May 3, 2016

以下是原始buffer.toJSON()的内容:{"type":"Buffer","data":[123,34,115,117,98,115,99,114,105,98,101,34,58,49,44,34,111,112,101,110,105,100,34,58,34,111,102,120,106,81,118,120,114,86,109,79,73,72,85,52,107,49,52,65,87,98,107,71,77,102,82,65,99,34,44,34,110,105,99,107,110,97,109,101,34,58,34,238,132,142,32,126,65,108,101,116,116,97,226,153,170,92,34,92,34,238,132,155,34,44,34,115,101,120,34,58,49,44,34,108,97,110,103,117,97,103,101,34,58,34,122,104,95,67,78,34,44,34,99,105,116,121,34,58,34,79,107,108,97,104,111,109,97,32,67,105,116,121,34,44,34,112,114,111,118,105,110,99,101,34,58,34,79,107,108,97,104,111,109,97,34,44,34,99,111,117,110,116,114,121,34,58,34,85,110,105,116,101,100,32,83,116,97,116,101,115,34,44,34,104,101,97,100,105,109,103,117,114,108,34,58,34,104,116,116,112,58,92,47,92,47,119,120,46,113,108,111,103,111,46,99,110,92,47,109,109,111,112,101,110,92,47,120,54,77,113,114,113,86,87,77,74,49,52,81,97,82,104,97,87,78,119,57,88,122,105,98,50,56,85,99,75,118,107,78,115,83,49,101,67,98,73,85,65,86,98,65,83,103,104,80,50,105,99,102,102,55,121,116,105,97,80,97,98,110,51,53,98,117,81,112,87,120,121,98,116,120,111,57,111,87,114,103,80,79,51,98,117,105,97,98,110,66,75,114,87,106,75,53,88,86,121,92,47,48,34,44,34,115,117,98,115,99,114,105,98,101,95,116,105,109,101,34,58,49,52,54,49,57,56,57,53,50,48,44,34,117,110,105,111,110,105,100,34,58,34,111,50,120,87,118,117,79,84,54,105,45,76,77,105,66,80,102,85,78,98,113,75,107,71,79,107,118,107,34,44,34,114,101,109,97,114,107,34,58,34,34,44,34,103,114,111,117,112,105,100,34,58,48,44,34,116,97,103,105,100,95,108,105,115,116,34,58,91,93,125]}

Date: Mon, 2 May 2016 18:39:35 -0700
From: [email protected]
To: [email protected]
CC: [email protected]; [email protected]
Subject: Re: [node-modules/urllib] Remove controls unicode (#77)

@glonely 你给的 buffer 是截断的,能否给完整的。。。


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub

@glonely
Copy link

glonely commented May 3, 2016

原始JSON中有 \“ 时 replaceJSONCtlChars这个会先将 ” 替换成 ",然后又将 \ 替换成 \ ,全部替换完后" 变成了 \",下一步JSON.parse(data)就异常了

@glonely
Copy link

glonely commented May 20, 2016

你好,对于urllib的fixJSONCtlChars 参数,是否使用 这个参数,使用者其实是不能确定的.比如微信的获取用户信息接口,用户昵称有些需要fix,有些不能使用fix感觉urllib内应该先尝试parse,如异常再使用fixJSONCtlChars.调用某个接口,如返回的JSON中有双引号,并且 " 这样了转义传递了fixJSONCtlChars参数后,replaceJSONCtlChars方法会先将双引号转成 " ,再将 \ 转成 \ 加上最初的转义符后一共有5个斜杠,导致执行replaceJSONCtlChars后JSON.parse出现异常.
Date: Mon, 2 May 2016 18:39:35 -0700
From: [email protected]
To: [email protected]
CC: [email protected]; [email protected]
Subject: Re: [node-modules/urllib] Remove controls unicode (#77)

@glonely 你给的 buffer 是截断的,能否给完整的。。。


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub

@jameskid007
Copy link

还是有这个问题。

@fengmk2 fengmk2 reopened this Sep 6, 2017
@fengmk2 fengmk2 closed this Sep 6, 2017
@fengmk2
Copy link
Member

fengmk2 commented Sep 6, 2017

@jameskid007 具体是什么问题?

@fengmk2
Copy link
Member

fengmk2 commented Sep 8, 2017

@guanbo @jameskid007 @glonely 没法一刀切解决所有问题,让使用者按自己的场景来自行解决。 #264

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants