-
Notifications
You must be signed in to change notification settings - Fork 463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] XML UTF-8 with BOM fails #330
Comments
You can specify the encoding in parse(), the default is utf-8 IANA currently lists 250+ character encodings. Python natively supports a subset of 109 encodings (plus some Python specific encodings). You cannot possibly expect xmltodict to know or to guess which one your input uses. |
set "PYTHONIOENCODING=utf8"
|
Seems you're right, explicitely passing bytes with BOM works just fine: import xmltodict
xml = '''<?xml version="1.0"?><test>123</test>'''
xml = xml.encode("utf-8-sig")
out = xmltodict.parse(xml)
print(out) # {'test': '123'} So maybe the error is somewhere else? Either the file has a different encoding, or the other libs you're using are modifying the string/bytes somehow. Edit: these work also: from io import BytesIO, StringIO
b = BytesIO(b'\xef\xbb\xbf<?xml version="1.0"?><test>123</test>')
print(xmltodict.parse(b.read()))
b = StringIO(b'<?xml version="1.0"?><test>123</test>'.decode("utf-8-sig"))
print(xmltodict.parse(b.read())) |
Just using https://github.com/twardoch/yaplon : D:\Pyenv310>xml22yaml -i "d:\Pyenv310\TEST\Alarms.xml" -o "d:\Pyenv310\TEST\Alarms.yaml" It is failing there : https://github.com/martinblech/xmltodict/blob/master/xmltodict.py#L378 From there : https://github.com/twardoch/yaplon/blob/master/yaplon/reader.py#L71 There should be an issue around here : https://github.com/martinblech/xmltodict/blob/master/xmltodict.py#L341 |
You can test any XML file with a BOM :
Regards.
The text was updated successfully, but these errors were encountered: