Parse bytes directly #356

robsmith11 · 2023-04-20T04:45:36Z

It would be nice if JSON.Parser.parse could be passed a vector of bytes and parse it assuming UTF-8 encoding without having to manually allocate a new String. My most common use case (probably for many other people too?) is downloading a JSON file with HTTP.get("...").body, which returns bytes.

The text was updated successfully, but these errors were encountered:

KristofferC · 2023-04-20T06:14:59Z

You could maybe use https://github.com/JuliaStrings/StringViews.jl.

robsmith11 · 2023-04-20T07:05:20Z

StringViews.jl does look good for use in projects, but would it make sense for more casual interactive use to have JSON.jl do something automatically when passed bytes?

KristofferC · 2023-04-20T09:18:14Z

One issue with that is that that means that arguably anything that accepts a string should also accept a byte buffer. And the best way to do that would probably be to use StringViews as a dependency and wrap the bytes in that. So it would kind of be equivalent except that all functions would have to define this instead of just the caller doing it.

kpa28-git · 2023-05-05T20:55:48Z

I've noticed that using StringViews instead of String does not improve performance for me (actually slightly worse performance and higher alloc). These are in the docs for String (julia 1.8.5). If I'm understanding right, strings produced from UTF-8 bytes already act like views.

String(v::AbstractVector{UInt8})
Create a new String object from a byte vector v containing UTF-8 encoded characters.
...
When possible, the memory of v will be used without copying when the String object is
created. This is guaranteed to be the case for byte vectors returned by take! on a writable
IOBuffer and by calls to read(io, nb). This allows zero-copy conversion of I/O data to
strings. In other cases, Vector{UInt8} data may be copied, but v is truncated anyway to
guarantee consistent behavior.

KristofferC · 2023-05-05T22:13:14Z

"When possible"

This is not that often the case, the array need to have been allocated in a special way for this.

And copying a chunk of memory like a string tends to be quite fast so it isn't unfeasible that you don't notice it. And maybe StringViews has some issue which make it slower than it should be.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parse bytes directly #356

Parse bytes directly #356

robsmith11 commented Apr 20, 2023

KristofferC commented Apr 20, 2023

robsmith11 commented Apr 20, 2023

KristofferC commented Apr 20, 2023

kpa28-git commented May 5, 2023

KristofferC commented May 5, 2023 •

edited

Loading

Parse bytes directly #356

Parse bytes directly #356

Comments

robsmith11 commented Apr 20, 2023

KristofferC commented Apr 20, 2023

robsmith11 commented Apr 20, 2023

KristofferC commented Apr 20, 2023

kpa28-git commented May 5, 2023

KristofferC commented May 5, 2023 • edited Loading

KristofferC commented May 5, 2023 •

edited

Loading