This document defines the biolinkml syntax and language.
A Biolinkml (blml) schema is a formal computable description of how entities within a data model are inter-related. While blml arose in response to a need in life-sciences domain modeling to define the Biolink Model, it is completely domain-neutral, and can be used to model pet stores, etc.
The primary representation of a schema is via a YAML document. This YAML document can be translated to other representations.
The 3 core modeling elements in blml are types, classes, and slots:
- types correspond to primitive datatypes, such as integers, strings, URIs
- classes are categories for data instances
- slots categorize the linkages instances can have to other instances, or to type instances
A schema is a collection of these elements.
The blml also defines basic mechanisms for model element inheritance: is_a, mixin and abstract properties for both classes and slots, plus a typeof property for types. In addition, the subclass_of property can anchor a class to the semantics of an ontology term in an external 3rd party (but model-designated) ontology. Semantic constraints to 'internal' model slot or class hierarchies are similarly constrained by domain, range and subproperty_of properties.
blml is intended to be used in a variety of modeling contexts: JSON documents, RDF graphs, RDF* graphs and property graphs, as well as tabular data. Converters exist for these different representations.
This document contains a mixture of normative and informative sections. Normative sections may have informative examples within. Normative elements are those that are prescriptive, that is they are to be followed in order to comply with scheme requirements. Informative elements are those that are descriptive, that is they are designed to help the reader understand the concepts presented in the normative elements.
blml is also described by its own schema, which is also Normative. The schema can also be viewed on this site.
The documentation in this specification must be consistent with the yaml representation.
The italicized keywords must, must not, should, should not, and may are used to specify normative features blml documents and tools, and are interpreted as specified in RFC 211.
The domain of biolinkml is an RDF graph:
G = Triple*
Triple = < Subject Predicate Object >
Subject = IRI | BlankNode
Predicate = IRI
Subject = IRI | BlankNode | Literal
The primary domain of biolinkml is an RDF graph, but blml schemas may be used for JSON documents, Property Graphs, UML object graphs, and tabular/relational data.
The normative representation of a blml schema is as a YAML document.
The document includes dictionaries of schema elements. Each dictionary is indexed by the element name. Dictionaries may be empty, and they may be listed in any order.
The structure of the document
- dictionary of prefixes
- dictionary of subsets
- dictionary of types
- dictionary of slots
- dictionary of classes
- additional schema-level declarations
- the schema must have an id
- the schema may have other declarations as allowed by SchemaDefinition
Example (Informative):
This example illustrates broadly structure of a blml schema. Ellipses indicate information omitted for brevity
id: https://example.org/example-schema
name: example schema
description: This is...
license: https://creativecommons.org/publicdomain/zero/1.0/
prefixes:
biolinkml: https://w3id.org/biolink/biolinkml/
ex: https://example.org/example-schema#
wgs: http://www.w3.org/2003/01/geo/wgs84_pos
qud: http://qudt.org/1.1/schema/qudt#
default_prefix: ex
default_curi_maps:
- semweb_context
emit_prefixes:
- rdf
- rdfs
- xsd
- skos
imports:
- biolinkml:types
subsets: ...
# Main schema follows
types: ...
slots: ...
classes: ...
All schema elements must have a unique name and a unique IRI. Names must be declared as keys in dictionaries, and IRIs are constructed automatically for these by concatenating the default_prefix with the IRI construction rule for that element type:
- class elements use a CamelCase construction rule
- slot, types, subset elements use a snake_case construction rule
Values for schema element slots may be IRIs, and these may be specified as CURIEs. CURIEs are shortform representations of URIs, and must be specified as PREFIX:LocalID
, where the prefix has an associated URI base. The prefix must be declared in one of several ways:
- a prefixes dictionary, where the keys are prefixes and the values are URI bases.
Example (Informative):
prefixes:
biolinkml: https://w3id.org/biolink/biolinkml/
wgs: http://www.w3.org/2003/01/geo/wgs84_pos#
qud: http://qudt.org/1.1/schema/qudt#
The CURIE wgs:lat
will exand to http://www.w3.org/2003/01/geo/wgs84_pos#lat.
- an external CURIE map specified via a default_curi_maps section.
Example (Informative):
default_curi_maps:
- semweb_context
- prefixes from public standard global namespaces used in the model (e.g. rdf) are indicated under the emit_prefixes section.
Example (Informative):
emit_prefixes:
- rdf
- rdfs
- xsd
- skos
- a default prefix within a given schema is generally also declared by a value for the default_prefix tag:
Example (Informative):
default_prefix: ex
Imports are specified as an import list in the main schema object. This specifies a set: the order of elements is not important.
imports:
- <IMPORT_1>
- <IMPORT_2>
- ...
- <IMPORT_n>
As mentioned in the Introduction, semantic inheritance within a model is specified by several BiolinkML reserved properties:
- is_a:
- abstract:
- mixin:
- typeof:
- subclass_of:
- domain:
- range:
- subproperty_of:
A few fundamental rules guiding the use of these properties include:
- range: the range of the mixins property in a class SHOULD be a mixin
- homeomorphicity: is_a SHOULD only connect either (1) two mixins (2) two classes (3) two slots
- instances MUST NOT instantiate a mixin slot or class directly since it has default abstract character; rather, it should be injected into non-abstract classes using the mixins property
See SlotDefinition.
Slots are properties that can be assigned to classes.
The set of slots available in a model is defined in a slot dictionary, declared at the schema level
slots:
SLOT_NAME_1: DEFINITION_1
SLOT_NAME_2: DEFINITION_2
...
SLOT_NAME_m: DEFINITION_n
Each key in the dictionary is the slot name. The slot name must be unique.
The SlotDefinition is described in the metamodel.
Each slot must have zero or one is_a parents, as defined by biolinkml:is_a
In addition a slot may have multiple mixin parents, as defined by biolinkml:mixins
We define function ancestors*(s)
which is the transitive close of the union of slot s, the parents of slot s and defined by the union of is_a
and mixins
.
In Biolink, as in the Web Ontology Language OWL class, is a classification of individuals into groups which share common characteristics. If an individual is a member of a class, it tells a machine reader that it falls under the semantic classification given by the class.
The set of classes available in a model is defined in a class dictionary, declared at the schema level. A class may have any number of slots declared.
classes:
CLASS_NAME_1:
slots:
- CLASS_1_SLOT_NAME_1
- CLASS_1_SLOT_NAME_2
- ...
- CLASS_1_SLOT_NAME_n
...
CLASS_NAME_p:
slots:
- CLASS_p_SLOT_NAME_1
- CLASS_p_SLOT_NAME_2
- ...
- CLASS_p_SLOT_NAME_q
Each declared slot must be defined in the slot dictionary. Note, however, that the use of declared slots in instances of a class are not mandatory unless slot_usage.required
property of the slot is declared 'true' directly in that class or indirectly, by parental inheritance.
Each class must have zero or one is_a parents, as defined by biolinkml:is_a
In addition a class may have multiple mixin parents, as defined by biolinkml:mixins
We define function ancestors*(c)
which is the transitive close of the union of class c, the parents of class c and defined by the union of is_a
and mixins
.
Each class may declare a dictionary of slot_usages.
CLASS:
slot_usage:
SLOT_1: USAGE_1
SLOT_2: USAGE_2
...
SLOT_n: USAGE_n
These refine a slot definition in the context of a particular class
def effective_slot_property(s, p, c):
if c declares slot_usage for p:
use the value for p
elif any m in mixins(c) declares a slot_usage for p:
use the value for p from this mixin
else:
if effective_slot_property(s, p, c.is_a):
use this
elif: effective_slot_property(s, p, m) for any mixin m:
use this
else:
use the default value of p for s
This section describes which slots can be used to describe which instances.
Consider instance i of type t, with an association of type s connecting to object or instance j.
{
## instance i of type t
"s": <j>
}
For this to be valid, it must be the case that:
- for all r
You can import standard types:
imports:
- biolinkml:types
includes in its type dictionary entries such as:
date:
uri: xsd:date
base: XSDDate
repr: str