-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ignore character data in mixed content XML #10
Comments
@nichtich This seems related to a bug in |
I checked with Given I can't think of a way to replicate <p> The <emph> cat </emph> ate the <foreign>grande croissant</foreign>. I didn't </p> since @nichtich Do you think its worth explicitly removing it, versus just leaving that up to the end user? The code to implement this change feels a bit hacky to me. I'll see if I can figure out a better solution. |
It's a bug if you use libxml2 with indent value other than
You could ignore character data in mixed content XML unless indent is |
This would require some logic that would see if there are other keys that aren't But because of #18, I'm now not loading anything into memory and only reading one token a time; which makes this much trickier. |
As discussed at #7, document-oriented XML requires another JSON serialization anyway. Supporting mixed content XML in the current JSON form is error-prone anyway. Even this simple case seems to be handled wrong (not the additional whitespace after
x
):The reason is whitespace handling in mixed content elements requires a more sophisticated algorithm (see this explanation).
Better ignore character data (
#text
) when there are child elements:The text was updated successfully, but these errors were encountered: