You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
However, a stream containing such texts will not be split correctly by splitstream. The keywords true, false, and null are silently dropped, as are numeric literals:
Attempting to insert a string literal will cause different, still incorrect behaviour. If there are no objects or arrays in the stream, the text is still silently dropped; however, if there is an object or array occurring somewhere after the string, the entire stream up to that object or array will be captured as one buffer.
I agree that ideally the stream should be split on every properly terminated (sub)document. This issue is currently documented as a known limitation (only arrays and objects are supported at the top level).
Currently the splitter uses a very basic lexer for splitting. There is unfortunately no really quick fix that maintains the current level of performance.
Edit to name this approach: "maximal syntactically valid substring matching". And to note that the JSON version (lower-case true, false, &c.) would also tokenize just fine. Those are valid Python symbols, even if they're not the correct ones for those singletons. Edit edit: this could implement parsing, too, by invoking literal_eval across the isolated fragments. JSON is valid Python, after all. 😜
It is valid for a JSON text to represent only a single scalar value, rather than an object or array - this is supported by Python's
json
module:However, a stream containing such texts will not be split correctly by
splitstream
. The keywordstrue
,false
, andnull
are silently dropped, as are numeric literals:Attempting to insert a string literal will cause different, still incorrect behaviour. If there are no objects or arrays in the stream, the text is still silently dropped; however, if there is an object or array occurring somewhere after the string, the entire stream up to that object or array will be captured as one buffer.
Attempting to parse these buffers with
json.loads
, naturally, does not work.The correct behaviour would be to split the stream on every toplevel JSON value, producing separate buffers for each - in other words:
The text was updated successfully, but these errors were encountered: