-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error's with non-escaped quotes and multiline JSON-LD #175
Comments
Could you check if #126 fixes this issue? |
@Gallaecio I'll give it a shot! Do you know the easiest way to test a package like this? I don't do python development very often so I'm not very familiar, but right now I'm using pipenv if that makes a difference. Should I just clone your fork and put it in my project to see if it works? |
I think this should do the job:
|
Ok it installed successfully with that command! (but not with zsh for some reason) Anyway, I'm still getting the error, except at a different position:
|
I ended up manually adding your fork so I could print out what was going on and I think I figured it out. In the replacement where it fails, the character before it is a space. So I guess it only fails once it hits another non-space character. I found this answer on Stack Overflow that attempts to solve pretty much the same issue your script is. I tweaked it a bit to match yours more accurately and this is what I ended up with, and it works great for my situation: def parse_json(json_string):
try:
return json.loads(json_string, strict=False)
except ValueError:
pass
# sometimes JSON-decoding errors are due to leading HTML or JavaScript comments
json_string = HTML_OR_JS_COMMENTLINE.sub('', json_string)
while True:
try:
return jstyleson.loads(json_string, strict=False)
except JSONDecodeError as error:
if (
hasattr(error, 'msg')
and error.msg == "Expecting ',' delimiter"
):
# position of unescaped '"' before unexpected char
open_quote_pos = json_string.rfind('"', 0, error.pos)
# escape opening '"' char
json_string = json_string[:open_quote_pos] + '\\' + json_string[open_quote_pos:]
# position of correspondig closing '"' (+2 for inserted '\')
close_quote_pos = json_string.find('"', open_quote_pos + 2)
# escape closing '"' char
json_string = json_string[:close_quote_pos] + '\\' + json_string[close_quote_pos:]
continue
raise The main difference is that it looks for both an opening and closing quote to replace. I found that without this, mine still failed with either a |
I'm not sure if this is an issue with extruct, or if there's anything I can do with extruct (or otherwise) to get around this issue, but I've been running into errors when attempting to parse JSON-LD from this site.
This is the first error I'm getting:
I believe this is because there are quotes in the
description
field's html that are not escaped.I'm also sometimes running into an issue where the description field is broken into multiple lines, hence not valid JSON.
So my question is, is there anything in this package to handle improperly formatted json like in this example, or would I have to do some pre-processing on the html body before I pass it to extruct. And if I do have to handle this on my own, does anyone have any ideas for a consistent way to handle this?
Here is the site I'm having an issue with: https://www.foodpantries.org/li/nm_roswell_saint-peters-church-community-kitchen
And here is the JSON-LD:
The text was updated successfully, but these errors were encountered: