Support UTF8String/WideString in JsonObject/JsonArray #31

zuobaoquan · 2016-08-14T09:00:51Z

I have to serialize and transmit data structures like this:

TFoo = class
  Html: WideString;
  FileContent: UTF8String;
end;

e.g. The producer may be Internet Explorer or IDE which limits the type/encoding.

It seems that I have to cast them to/from string in both sides. Since the content is pretty large, it would be great if the object model could support these two types natively. It should be also more efficient as I use UTF8-encoding for transmission.

The text was updated successfully, but these errors were encountered:

ahausladen · 2016-08-14T09:23:06Z

UTF8String would be a good addition.

WideString on the other side is a copy-on-assign data type. The data would still be copied into the internal data structure. So it makes no difference if it is copied to another WideString or a UnicodeString (that then uses copy-on-write internally).

When I initially developed JasonDataObject, I thought about using UTF8String internally only to reduce the memory usage. But that would have meant that a conversion would be needed every time a property is accessed.

Maybe I should add a logic that if you parse a UTF8 stream, all strings are stored in UTF8 and if you access a property via a UnicodeString getter, it is automatically converted and the UTF16 string replaces the internal stored UTF8 string. So you can have the best of both worlds. Only strings that are accessed are converted to UTF16, making the UTF8 parser a little bit faster and it saves memory (unless you have UTF8 code-points that require more than two UTF8 characters).

zuobaoquan · 2016-08-14T09:40:14Z

ok. let's forget about widestring. although there are potential solutions, I can live with that.

UTF8String is more useful. Your idea is very brilliant :-)

zuobaoquan · 2016-08-14T10:18:24Z

btw. depends on your idea, when parsing a UTF8Stream, will be underlying data UTF8String or a managed memory buffer?
in former case, will be the underlying utf8string always freed when serialize/deserialize objects containing string members to/from utf8stream?

ahausladen · 2016-08-14T11:16:39Z

It would be stored as UTF8String (if the platform supports it, otherwise is falls back to UnicodeString) so you have the benefit of a reference counted copy-on-write string.

zuobaoquan · 2016-08-14T11:27:10Z

Just an idea, an option to specify default string encoding might be helpful.
e.g. when read/write json with utf-8 encoding (widely used), object properties maybe just string, in this case, underlying UTF8String instances will be always freed.

TFoo = class
  Name: string;
end;

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support UTF8String/WideString in JsonObject/JsonArray #31

Support UTF8String/WideString in JsonObject/JsonArray #31

zuobaoquan commented Aug 14, 2016

ahausladen commented Aug 14, 2016

zuobaoquan commented Aug 14, 2016

zuobaoquan commented Aug 14, 2016

ahausladen commented Aug 14, 2016

zuobaoquan commented Aug 14, 2016

Support UTF8String/WideString in JsonObject/JsonArray #31

Support UTF8String/WideString in JsonObject/JsonArray #31

Comments

zuobaoquan commented Aug 14, 2016

ahausladen commented Aug 14, 2016

zuobaoquan commented Aug 14, 2016

zuobaoquan commented Aug 14, 2016

ahausladen commented Aug 14, 2016

zuobaoquan commented Aug 14, 2016