Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support UTF8String/WideString in JsonObject/JsonArray #31

Open
zuobaoquan opened this issue Aug 14, 2016 · 5 comments
Open

Support UTF8String/WideString in JsonObject/JsonArray #31

zuobaoquan opened this issue Aug 14, 2016 · 5 comments

Comments

@zuobaoquan
Copy link

I have to serialize and transmit data structures like this:

TFoo = class
  Html: WideString;
  FileContent: UTF8String;
end;

e.g. The producer may be Internet Explorer or IDE which limits the type/encoding.

It seems that I have to cast them to/from string in both sides. Since the content is pretty large, it would be great if the object model could support these two types natively. It should be also more efficient as I use UTF8-encoding for transmission.

@ahausladen
Copy link
Owner

UTF8String would be a good addition.

WideString on the other side is a copy-on-assign data type. The data would still be copied into the internal data structure. So it makes no difference if it is copied to another WideString or a UnicodeString (that then uses copy-on-write internally).

When I initially developed JasonDataObject, I thought about using UTF8String internally only to reduce the memory usage. But that would have meant that a conversion would be needed every time a property is accessed.

Maybe I should add a logic that if you parse a UTF8 stream, all strings are stored in UTF8 and if you access a property via a UnicodeString getter, it is automatically converted and the UTF16 string replaces the internal stored UTF8 string. So you can have the best of both worlds. Only strings that are accessed are converted to UTF16, making the UTF8 parser a little bit faster and it saves memory (unless you have UTF8 code-points that require more than two UTF8 characters).

@zuobaoquan
Copy link
Author

ok. let's forget about widestring. although there are potential solutions, I can live with that.

UTF8String is more useful. Your idea is very brilliant :-)

@zuobaoquan
Copy link
Author

btw. depends on your idea, when parsing a UTF8Stream, will be underlying data UTF8String or a managed memory buffer?
in former case, will be the underlying utf8string always freed when serialize/deserialize objects containing string members to/from utf8stream?

@ahausladen
Copy link
Owner

It would be stored as UTF8String (if the platform supports it, otherwise is falls back to UnicodeString) so you have the benefit of a reference counted copy-on-write string.

@zuobaoquan
Copy link
Author

Just an idea, an option to specify default string encoding might be helpful.
e.g. when read/write json with utf-8 encoding (widely used), object properties maybe just string, in this case, underlying UTF8String instances will be always freed.

TFoo = class
  Name: string;
end;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants