Skip to content
This repository has been archived by the owner on Mar 22, 2021. It is now read-only.

Large JSON files cause a memory error #1

Open
adein opened this issue Mar 1, 2017 · 2 comments
Open

Large JSON files cause a memory error #1

adein opened this issue Mar 1, 2017 · 2 comments

Comments

@adein
Copy link
Owner

adein commented Mar 1, 2017

Large Hangouts JSON files cause Python to crash with a memory error while parsing the JSON file. This is because Python tries to parse the whole file at once.

Short-term solution:
Edit the file in a text editor that supports large files and remove any Hangouts/non-SMS conversations.

Long-term solution:
Change the JSON parsing to use a 3rd party stream-based parser instead of the native Python library.

Error trace:

Traceback (most recent call last):
File "hangouts_to_sms.py", line 15, in <module>
conversations, self_gaia_id = hangouts_parser.parse_input_file(HANGOUTS_JSON_FILE, YOUR_PHONE_NUMBER)
File "..\hangouts_to_sms-master\hangouts_parser.py", line 23, in parse_input_file
data = json.load(data_file, object_hook=lambda d: Namespace(**d))
File "..\AppData\Local\Programs\Python\Python36-32\lib\json_init_.py", line 299, in load
parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
File "..\AppData\Local\Programs\Python\Python36-32\lib\json_init_.py", line 349, in loads
s = s.decode(detect_encoding(s), 'surrogatepass')
MemoryError
@catskul
Copy link

catskul commented May 22, 2017

I have a branch that uses ijson which might be a candidate for pull request.

I have to verify that it spits out the same output as the current version before I submit it.

Will try tonight.

@gbrown2036
Copy link

Thanks, catskul. Keep us posted.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants