Json-schema is very useful to document and validate inputs and outputs of JSON-based REST APIs. Unfortunately, the schemas are more verbose and less human-readable than one may wish. This library defines a more compact syntax to describe JSON schemas, as well as a parser to convert such specifications into actual JSON schema.
At some point in the future, this library may also offer the way back, from JSON schemas back to a compact notation.
Litteral JSON types are accessible as keywords: boolean
, string
,
integer
, number
, null
.
Regular expression strings are represented by r
-prefixed litteral
strings, similar to Python's litterals: r"^[0-9]+$"
converts into
{"type": "string", "pattern": "^[0-9]+$"}
.
Predefined formats are represented by f
-prefixed litteral
strings: f"uri"
converts into {"type": "string", "format": "uri"}
.
JSON constants are introduced between back-quotes: `123`
converts to {"const": 123}
. If several constants are joined with an
|
operator, they are translated into an enum: `1`|`2`
converts
to {"enum": [1, 2]}
.
Arrays are described between square brackets:
[]
describes every possible array, and can also be writtenarray
.- an homogeneous, non-empty array of integers is denoted
[integer+]
- an homogeneous, possibly empty array of integers is denoted
[integer*]
- an array starting with two booleans is denoted
[boolean, boolean]
. It can also contain additional items after those two booleans. - To forbid additional items, add an
only
keyword at the beginning of the array:[only boolean, boolean]
will reject[true, false, 1]
, whereas[boolean, boolean]
would have validated it. - arrays support cardinal suffix between braces:
[]{7}
is an array of 7 elements,[integer*]{3,8}
is an array of between 3 and 8 integers (inclusive),[]{_, 9}
an array of at most 9 elements,[string*]{4, _}
an array of at least 4 strings. - a uniqueness constraint can be added with the
unique
prefix, as in[unique integer+]
, which will allow[1, 2, 3]
but not[1, 2, 1]
since1
occurs more than once.
Strings and integers also support cardinal suffixes,
e.g. string{16}
, integer{_, 0xFFFF}
. Integer ranges as well as
sizes are inclusive.
Objects are described between curly braces:
{ }
describes every possible object, and can also be writtenobject
.{"bar": integer}
is an object with one field"bar"
of type integer, and possibly other fields.- Quotes are optional around property names, if they are identifiers other
than
"_"
or"only"
: it's legal to write{bar: integer}
. - To prevent non-listed property names from being accepted, use a
prefix
only
, as in{only "bar": integer}
. - property names can be forced to comply with a regex, by an
only r"regex"
prefix, which can also be a reference to a definition:{only r"^[a-z]+$"}
, or the equivalent{only <word>} where word=r"^[a-z]$"+
. Beware that according to jsonschema, even explicitly listed property names must comply with the regex, for instance nothing can satisfy the schema{only r"^[0-9]+$", "except_this": _}
. You can circumvent this limitation in several ways, e.g.{only r"^([0-9]+|except_this)$"}
, or{only <key>} where key = `"except_this"` | r"^[0-9]+$"
. - In addition to enforcing a regex on property names, one can also
enforce a type constraint on the associated values:
{only <word>: integer}
. If no naming constraint is desired, the name can be replaced by an underscore wildcard:{only _: integer}
. - A special type
forbidden
, equivalent to JSONSchema'sfalse
, can be used to specifically forbid a property name:{reserved_name?: forbidden}
. Notice that the question mark is mandatory: otherwise, it would both expect the property to exist, and accept no value in it.
Definitions can be used in the schema, and given with a suffix where name0 = def0 and ... and nameX=defX
. References to definitions are
put between angles, for instance {author: <user_name>} where user_name = r"^\w+$". When dumping the schema into actual jsonschema, unused definitions are pruned, and missing definitions cause an error. Definitions can only occur at top-level, i.e.
{foo: } where bar=numberis legal, but
{foo: ( where bar=number)}` is not.
Types can be combined:
- With infix operator
&
:A & B
is the type of objects which respect both schemasA
andB
. - With infix operator
|
:A | B
is the type of objects which respect at least one of the schemasA
orB
.&
takes precedence over|
, i.e.A & B | C & D
is to be read as(A&B) | (C&D)
. - With conditional expressions:
if A then B elif C then D else E
will enforce constraintB
if constraintA
is met, enforceD
ifC
is met, or enforceE
if neitherA
norC
are met.elif
andelse
parts are optional. For instance,if {country: "USA"} then {postcode: r"\d{5}(-\d{4})?"} else {postcode: string}
will only check the postcode with the regex if the country is"USA"
. - Parentheses can be added to enforce precedences , e.g.
A & (B|C) & D
Combinations can also be performed on Python objects, e.g. the following
Python expression is OK: Schema("{foo: number}") | Schema("{bar: number}"), and produces a schema equivalent to
Schema("{foo: number}|{bar: number}"). When definitions are merged in Python with
|or
&, their definitions are merged as needed. If a definition appears on both sides, it must be equal, i.e. one can merge
{foo: } where n=numberwith
{bar: } where n=number but not with
{foo: } where n=integer`.
schema ::= type («where» definitions)?
definitions ::= identifier «=» type («and» identifier «=» type)*
type ::= type «&» type # allOf those types; takes precedence over «|».
| type «|» type # anyOf those types.
| «(» type «)» # parentheses to enforce precedence.
| «not» type # anything but this type.
| «`»json_litteral«`» # just this JSON constant value.
| «<»identifier«>» # identifier refering to the matching top-level definition.
| r"regular_expression" # String matched by this regex.
| f"format" # json-schema draft7 string format.
| «string» cardinal? # a string, with this cardinal constraint on number of chars.
| «integer» cardinal? # an integer within the range described by cardinal.
| «integer» «/» int # an integer which must be multiple of that int.
| «object» # any object.
| «array» # any array.
| «boolean» # any boolean.
| «null» # the null value.
| «number» # any number.
| «forbidden» # empty type (used mostly to disallow a property name).
| object # structurally described object.
| array # structurally described array.
| conditional # conditional if/then/else rule
cardinal ::= «{» int «}» # Exactly that number of chars / items / properties.
| «{» «_», int «}» # At most that number of chars / items / properties.
| «{» int, «_» «}» # At least that number of chars / items / properties.
| «{» int, int «}» # A number of chars / items / properties within this range.
object ::= «{» object_restriction? (object_key «?»? «:» type «,»...)* «}» cardinal?
# if «only» occurs without a regex, no extra property is allowed.
# if «only» occurs with a regex, all extra property names must match that regex.
# if «?» occurs, the preceding property is optional, otherwise it's required.
object_restriction ::= ø
# Only explicitly listed property names are accepted:
| «only»
# non-listed property names must conform to regex/reference:
| «only» (r"regex" | «<»identifier«>»)
# non-listed property names must conform to regex, values to type:
| «only» (r"regex" | «<»identifier«>» | «_»)«:» type
object_key ::= identifier # Litteral property name.
| «"»string«"» # Properties which aren't identifiers must be quoted.
array ::= «[» «only»? «unique»? (type «,»)* («*»|«+»|ø) «]» cardinal?
# if «only» occurs, no extra item is allowed.
# if «unique» occurs, each array item must be different from every other.
# if «*» occurs, the last type can be repeated from 0 to any times.
# Every extra item must be of that type.
# if «+» occurs, the last type can be repeated from 1 to any times.
# Every extra item must be of that type.
conditional ::= «if» type «then» type («elif» type «then» type)* («else» type)?
int ::= /0x[0-9a-FA-F]+/ | /[0-9]+/
identifier ::= /[A-Za-z_][A-Za-z_0-9]*/
Some things that may be added in future versions:
- on numbers:
- ranges over floats (reusing cardinal grammar with float boundaries)
- modulus constraints on floats
number/0.25
. - exclusive ranges in addition to inclusive ones. May use returned
braces, e.g.
integer{0,0x100{
as an equivalent forinteger{0,0xFF}
? - ranges alone are treated as integer ranges, i.e.
{1, 5}
is a shortcut forinteger{1, 5}
? Not sure whether it enhances readability, and there would be a need for float support in ranges then.
- combine string constraints: regex, format, cardinals... This can
already be achieved with operator
&
. - try to embedded
#
-comments as"$comment"
- Implementation:
- bubble up
?
markers in grammar to the top level.
- bubble up
- Syntax sugar:
- optional marker:
foobar?
is equivalent tofoobar|null
. Not sure whether it's worth it, the difference between a missing field and a field holdingnull
is most commonly not significant. - check that references as
propertyNames
indeed point at string types. - make keyword case-insensitive?
- treat
{foo: forbidden}
as{foo?: forbidden}
as it's the only thing that would make sense?
- optional marker:
- better error messages, on incorrect grammars, and on non-validating JSON data.
- reciprocal feature: try and translate a JSON-schema into a shorter and more readable JSCN source.
$ echo -n '[integer*]' | jscn -
{ "type": "array",
"items": {"type": "integer"},
"$schema": "http://json-schema.org/draft-07/schema#"
}
$ jscn --help
usage: jscn [-h] [-o OUTPUT] [-v] [--version] [filename]
Convert from a compact DSL into full JSON schema.
positional arguments:
filename Input file; use '-' to read from stdin.
optional arguments:
-h, --help show this help message and exit
-o OUTPUT, --output OUTPUT
Output file; defaults to stdout
-v, --verbose Verbose output
--version Display version and exit
Python's jsonschema_cn
package exports two main constructors:
Schema()
, which compiles a source string into a schema object;Definitions()
, which compiles a source string (a sequence of definitions separated by keywordand
, as in ruledefinitions
of the formal grammar.
Schema objects have a jsonschema
property, which contains the Python
dict of the corresponding JSON schema.
Schemas can be combined with Python operators &
("allOf"
) and |
("anyOf"
). When they have definitions, those definition sets are
merged, and definition names must not overlap.
Schemas can also be combined with definitions through |
, and
definitions can be combined together also with |
.
>>> from jsonschema import Schema, Definitions
>>> defs = Definitions("""
>>> id = r"[a-z]+" and
>>> byte = integer{0,0xff}
>>> """)
>>> s = Schema("{only <id>: <byte>}") | defs
>>> s.jsonschema
ValueError: Missing definition for byte
>>> s = s | defs
>>> s.jsonschema
{"$schema": "http://json-schema.org/draft-07/schema#"
"type": "object",
"propertyNames": {"$ref": "#/definitions/id"},
"additionalProperties": {"$ref": "#/definitions/byte"},
"definitions": {
"id": {"type": "string", "pattern": "[a-z]+"},
"byte": {"type": "integer", "minimum": 0, "maximum": 255}
}
}
>>> Schema("[integer, boolean+]{4}").jsonschema
{ "$schema": "http://json-schema.org/draft-07/schema#",
"type": "array",
"minItems": 4, "maxItems": 4,
"items": [{"type": "integer"}],
"additionalItems": {"type": "boolean"},
}
If you spend a lot of time dealing with complex JSON data structures, you might also want to try jsview, a smarter JSON formatter, which tries to effectively use both your screen's width and height, by only inserting q carriage returns when it makes sense:
$ cat >schema.cn <<EOF
{ only codes: [<byte>+], id: r"[a-z]+", issued: f"date"}
where byte = integer{0, 0xFF}
EOF
$ jscn schema.cn
{"type": "object", "required": ["codes", "id", "issued"], "properties": {
"codes": {"type": "array", "items": [{"$ref": "#/definitions/byte"}], "ad
ditionalItems": {"$ref": "#/definitions/byte"}}, "id": {"type": "string",
"pattern": "[a-z]+"}, "issued": {"type": "string", "format": "date"}}, "a
dditionalProperties": false, "definitions": {"byte": {"type": "integer",
"minimum": 0, "maximum": 255}}, "$schema": "http://json-schema.org/draft-
07/schema#"}
$ cat schema.cn | jscn - | jsview -
{
"type": "object",
"required": ["codes", "id", "issued"],
"properties": {
"codes": {
"type": "array",
"items": [{"$ref": "#/definitions/byte"}],
"additionalItems": {"$ref": "#/definitions/byte"}
},
"id": {"type": "string", "pattern": "[a-z]+"},
"issued": {"type": "string", "format": "date"}
},
"additionalProperties": false,
"definitions": {"byte": {"type": "integer", "minimum": 0, "maximum": 255}},
"$schema": "http://json-schema.org/draft-07/schema#"
}