-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: JSON Module #83
base: master
Are you sure you want to change the base?
Conversation
f1d17c8
to
472d4f1
Compare
The `json.decode()` function decodes a JSON document into a Starlark value. | ||
|
||
```python | ||
def json.decode(data): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool! Is there any reason we wouldn't follow the python json interface, perhaps only the string loads
/dumps
versions? They might have a few extra flags, but 99% of the uses would suffice with the flags already included in this spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with following the Python API here. Historically Bazel hasn't worried too much about cross-compatibility with the Python stdlib, and I thought loads
/ dumps
is a little opaque to people who don't use Python regularly, but I'd accept the functionality under ~any name.
@laurentlb do you (or anyone else) have concerns with this? I'd love to see this merged so that we could move towards an implementation. A tool I'm working on needs JSON, and I've ended up copy-pasting files from this PR as there isn't a standard here yet: google/starlark-go#179 |
This change defines a standard Starlark module for JSON encoding and decoding. See json.go for documentation. It is intended to subsume, generalize, and eventually replace Bazel's ill-conceived struct.to_json method. The json module is predeclared in the Starlark REPL environment. See related issues: bazelbuild/bazel#7896 https://buganizer.corp.google.com/issues/23962735 https://buganizer.corp.google.com/issues/70210417 bazelbuild/bazel#7879 (comment) bazelbuild/bazel#5542 bazelbuild/bazel#10176 bazelbuild/starlark#83 bazelbuild/bazel#3732 Change-Id: I297ffaee9349eedeeb52f5a88f40636a4095f997
This change defines a standard Starlark module for JSON encoding and decoding. See json.go for documentation. It is intended to subsume, generalize, and eventually replace Bazel's ill-conceived struct.to_json method. The json module is predeclared in the Starlark REPL environment. See related issues: bazelbuild/bazel#7896 https://buganizer.corp.google.com/issues/23962735 https://buganizer.corp.google.com/issues/70210417 bazelbuild/bazel#7879 (comment) bazelbuild/bazel#5542 bazelbuild/bazel#10176 bazelbuild/starlark#83 bazelbuild/bazel#3732
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of the JSON API, and the description here seems quite good. A few questions:
What is the diff to the spec? Does this proposal go as a brand new file in the repo? Assuming its accepted, what diff would we apply to the spec?
What is a module? How do you in a standards conforming way import the "json" module? I couldn't see any existing modules in the spec (maybe I missed them?). Or maybe you mean json is an always available name with some fields on it? If so, it feels a bit weird to introduce a whole new mechanism not present elsewhere in the spec for just two functions. Or maybe you want to use this mechanism more widely?
* JSON strings are decoded to Starlark `string` values. | ||
* JSON numbers with no fractional component are decoded to Starlark `int` values. Starlark implementations without | ||
arbitrary-precision integers should reject numbers that exceed their supported range. | ||
* JSON numbers with a fractional component may be decoded to an arbitrary-precision or floating-point value, if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it conforming for implementations without floating-point values to keep them as a string? Or do they have to error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An implicit type conversion of float to string would be very surprising to me as a user, so I think I'd prefer the implementation to error if it can't accept a given input value.
arbitrary-precision integers should reject numbers that exceed their supported range. | ||
* JSON numbers with a fractional component may be decoded to an arbitrary-precision or floating-point value, if | ||
supported by the current Starlark implementation. | ||
* Starlark implementations without arbitrary-precision numeric values should reject numbers that exceed their |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if they are within the bounds, but exceed the precision to represent them unambiguously?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess "range" is a little wrong here, but I mean numbers that the implementation can't represent. In the trivial case, a value 0.00[...]01
with sufficient zeros to overflow a 64-bit IEEE754 would be rejected by most implementations.
I hope that issues of floating point are mostly academic because most Starlark implementations don't allow floats by default, and the primary client codebase (Bazel extensions) doesn't allow floats at all.
No idea! Happy to format this in whatever way seems reasonable to you. It doesn't seem like there's much prior art here, so I formatted it as an RFC-type file.
Bazel defines a list of "global modules" (https://docs.bazel.build/versions/3.4.0/skylark/lib/skylark-overview.html#--global-modules) but doesn't go into detail about what exactly a "module" is at the language level. For this proposal I had imagined a new global symbol named
If this proposal is successful + accepted then I would like to propose more modules, specifically |
But Starlark doesn't. Maybe we need a corresponding piece of the Starlark spec that says what a "module" is, whether it needs loading, how it behaves, whether it's a first-class value etc. |
Hi, back from vacation. Sorry this proposal has been neglected and that this has process not been a model of good open-source stewardship. Regarding the the project: Starlark/Java was until very recently highly entangled with Bazel, though after a year of work nearly all its production dependencies on "Bazel proper" are now severed. However, the tests are still entangled, and there are numerous subtle constraints imposed on Starlark by Bazel that I plan to break soon to unblock significant optimizations (e.g. flat environments, byte code compilation). In addition we plan to move the Java classes to a different package (java.starlark.net) to emphasize the boundary between components, though we have no plans yet to host them in a separate repository; that would of course require the API to be refined, stabilized, and frozen. I believe the separation is in the long term interests of the Bazel project and its users, especially if Starlark becomes a widely used and more familiar language for configuration, but for the Bazel team it does pose immediate costs (in ceded control), and the team has yet to make any public commitments about Starlark. Regarding JSON: In June I committed https://github.com/google/starlark-go/blob/master/starlarkjson/json.go#L26, a JSON module I wrote some years ago for the Go implementation of Starlark (go.starlark.net). I believe its interface addresses all the needs met by the one proposed here, and indeed was influenced by it. It is not too late to change it, so if you see any shortcomings of deficiencies, please let me know. Otherwise I propose that we port it directly to Java. Regarding modules: Starlark doesn't yet have a standard library of packages, though there is clearly a recurring need for things like json, time, http, and html. (Brendan @b5 at https://github.com/qri-io/starlib has built many good candidates for standardization.) We don't even have a clear decision yet on how such packages should be imported. Sometimes we use load(), other times the names are predeclared (as I did with 'json' in the Go Starlark REPL, though I don't think it's very satisfactory). Personally I would like to extend load so that modules are first-class, e.g. something like Regarding proto: I have ported the Go implementation of the Starlark proto module so that it depends only on open-source proto reflection (which is new), so I should be able to open-source it. However, proto support for Starlark/Java will be tricky, because it has no means of representing byte strings, floating-point numbers, or integers outside the signed 32-bit range, and each of those poses surprisingly thorny problems. BTW: Nick: I enjoyed both of your papers on build systems! |
This CL ports the go.starlark.net/starlarkjson module from Go to Java. The json module provides four functions: json.decode json.encode json.indent (not yet implemented) json.encode_indent (not yet implemented) It is tested through eval.ScriptTest, which adds the json module to its environment, along with 'struct', a simple struct-like type. Some tests are commented out, awaiting StarlarkFloat, or richer string support. This is a first step towards removing Bazel's struct.to_json. Updates bazelbuild/starlark#83 PiperOrigin-RevId: 337944489
I just committed bazelbuild/bazel@6e47a40, which adds a Thanks to @brandjon for many very careful rounds of review. |
What's the use-case for the |
Perhaps the caller wants to insert the result inside some other piece of text, or JSON header/footer, that imposes its own indentation. |
This change predeclares 'json', the new Starlark module for JSON encoding/decoding/indenting, in all Bazel Starlark environments (alongside depset, select, etc). The new function works for any legal value, not just struct, and avoids polluting the struct field namespace. struct.to_json is deprecated, along with struct.to_proto, and will be disabled by the new flag --incompatible_struct_has_no_methods. (The replacement for to_proto will be added shortly.) Updates bazelbuild/starlark#83 Updates #3732 Updates #1813 RELNOTES: The Starlark json module is now available. Use json.encode(x) to encode a Starlark value as JSON. struct.to_json(x) is deprecated and will be disabled by the --incompatible_struct_has_no_methods flag. PiperOrigin-RevId: 338259618
Both the Java and Go implementations now have compatible implementations of the json module. Closing. |
Since it's been implemented, shouldn't this PR be merged rather than closed? Optionally with updates to reflect whatever common API was decided upon by the Java/Go impl devs. |
Er, yes, sorry about that. I just noticed that I also have a bunch of old pending comments on your doc from months ago that some how never got flushed out; I'll do that now. The Java implementation is here: |
|
||
## Abstract | ||
|
||
This document proposes an API for encoding and decoding JSON from Starlark. If accepted, implementations of Starlark |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for sending this. I'm not sure how it escaped my attention till today, when I happened to resume work on the go.starlark.net JSON encoder/decoder (google/starlark-go#179) that you mention below.
|
||
* Bazel's [`struct`](https://docs.bazel.build/versions/master/skylark/lib/struct.html) type has a `to_json()` method | ||
that can generate JSON documents. Issue [bazelbuild/bazel#7879](https://github.com/bazelbuild/bazel/issues/7879) | ||
requests additional control over the behavior of this method (i.e. optional whitespace). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is addressed in my Go implementation by separating encoding and prettyprinting, in exactly the same way as these concepts are separated in the Go standard library's encoding/json package. Existing JSON strings can be prettyprinted in a single pass with very little state, without the need to decode and reencode.
``` | ||
|
||
Type conversions are: | ||
* JSON arrays are decoded to Starlark `list` values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add: new, unfrozen list values. Ditto for dicts.
* JSON objects are decoded to Starlark `dict` values. Keys are in the same order as the input data. | ||
* JSON `true`, `false`, and `null` literals are decoded to Starlark `True`, `False`, and `None` respectively. | ||
* JSON strings are decoded to Starlark `string` values. | ||
* JSON numbers with no fractional component are decoded to Starlark `int` values. Starlark implementations without |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Go implementation now emits a string of decimal integers for a Starlark int value, no matter how large; it does not truncate big ints to uint64 or int53 or int32, nor use scientific notation. This does mean that decoder implementations with integer width limitations may not be able to read these files. However, the meaning of the JSON files is quite clear.
The Go impl also uses a %g floating point representation for a Starlark value of type float, even if the value is integral. This means round-tripping Starlark values via JSON preserves int vs float representation type, which seems desirable. (The decoder uses the presence of a decimal point to indicate 'float'.)
arbitrary-precision integers should reject numbers that exceed their supported range. | ||
* JSON numbers with a fractional component may be decoded to an arbitrary-precision or floating-point value, if | ||
supported by the current Starlark implementation. | ||
* Starlark implementations without arbitrary-precision numeric values should reject numbers that exceed their |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that rejecting unrepresentable values is the best course of action.
``` | ||
|
||
Type conversions are: | ||
* Starlark `list` and `tuple` values are encoded to JSON arrays. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Go implementation generalizes this to all iterable sequences (that are not iterable mappings).
|
||
Type conversions are: | ||
* Starlark `list` and `tuple` values are encoded to JSON arrays. | ||
* Starlark `dict` values are encoded to JSON objects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Go implementation also encodes Starlark struct values as JSON objects.
If `indent` is a number, it is how many spaces to indent by. Indent levels less than 1 will only insert newlines. | ||
If `indent` is `None` (the default), JSON will be encoded in one line with no extra spaces. | ||
|
||
If `sort_keys` is `True`, then encoded objects' keys are sorted in lexicographical order. If `sort_keys` is `False` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this feature necessary?
(Update: During review of the Java implementation, we decided that structs and dicts should both sort their fields/keys.)
This document proposes an API for encoding and decoding JSON from Starlark. If accepted, implementations of Starlark that implement JSON support would be expected to implement this API. The goal is to allow users to write JSON code that behaves consistently across Starlark implementations.