Skip to content
This repository has been archived by the owner on Oct 2, 2021. It is now read-only.

Latest commit

 

History

History
394 lines (268 loc) · 11.6 KB

SPECIFICATION.md

File metadata and controls

394 lines (268 loc) · 11.6 KB

biolinkml specification (DRAFT)

Introduction (Informative)

This document defines the biolinkml syntax and language.

A Biolinkml (blml) schema is a formal computable description of how entities within a data model are inter-related. While blml arose in response to a need in life-sciences domain modeling to define the Biolink Model, it is completely domain-neutral, and can be used to model pet stores, etc.

The primary representation of a schema is via a YAML document. This YAML document can be translated to other representations.

The 3 core modeling elements in blml are types, classes, and slots:

  • types correspond to primitive datatypes, such as integers, strings, URIs
  • classes are categories for data instances
  • slots categorize the linkages instances can have to other instances, or to type instances

A schema is a collection of these elements.

The blml also defines basic mechanisms for model element inheritance: is_a, mixin and abstract properties for both classes and slots, plus a typeof property for types. In addition, the subclass_of property can anchor a class to the semantics of an ontology term in an external 3rd party (but model-designated) ontology. Semantic constraints to 'internal' model slot or class hierarchies are similarly constrained by domain, range and subproperty_of properties.

blml is intended to be used in a variety of modeling contexts: JSON documents, RDF graphs, RDF* graphs and property graphs, as well as tabular data. Converters exist for these different representations.

This document contains a mixture of normative and informative sections. Normative sections may have informative examples within. Normative elements are those that are prescriptive, that is they are to be followed in order to comply with scheme requirements. Informative elements are those that are descriptive, that is they are designed to help the reader understand the concepts presented in the normative elements.

blml is also described by its own schema, which is also Normative. The schema can also be viewed on this site.

The documentation in this specification must be consistent with the yaml representation.

The italicized keywords must, must not, should, should not, and may are used to specify normative features blml documents and tools, and are interpreted as specified in RFC 211.

Domain (Normative)

The domain of biolinkml is an RDF graph:

G = Triple*
Triple = < Subject Predicate Object >
Subject = IRI | BlankNode
Predicate = IRI
Subject = IRI | BlankNode | Literal

Notes (informative)

The primary domain of biolinkml is an RDF graph, but blml schemas may be used for JSON documents, Property Graphs, UML object graphs, and tabular/relational data.

Schema Representation (Informative)

The normative representation of a blml schema is as a YAML document.

The document includes dictionaries of schema elements. Each dictionary is indexed by the element name. Dictionaries may be empty, and they may be listed in any order.

The structure of the document

  • dictionary of prefixes
  • dictionary of subsets
  • dictionary of types
  • dictionary of slots
  • dictionary of classes
  • additional schema-level declarations
    • the schema must have an id
    • the schema may have other declarations as allowed by SchemaDefinition

Example (Informative):

This example illustrates broadly structure of a blml schema. Ellipses indicate information omitted for brevity

id: https://example.org/example-schema
name: example schema
description: This is...
license: https://creativecommons.org/publicdomain/zero/1.0/

prefixes:
  biolinkml: https://w3id.org/biolink/biolinkml/
  ex: https://example.org/example-schema#
  wgs: http://www.w3.org/2003/01/geo/wgs84_pos
  qud: http://qudt.org/1.1/schema/qudt#
  
default_prefix: ex

default_curi_maps:
  - semweb_context

emit_prefixes:
  - rdf
  - rdfs
  - xsd
  - skos

imports:
  - biolinkml:types

subsets: ...

# Main schema follows
types: ...
slots: ...
classes: ...

Names and Namespaces (Normative)

All schema elements must have a unique name and a unique IRI. Names must be declared as keys in dictionaries, and IRIs are constructed automatically for these by concatenating the default_prefix with the IRI construction rule for that element type:

  • class elements use a CamelCase construction rule
  • slot, types, subset elements use a snake_case construction rule

Values for schema element slots may be IRIs, and these may be specified as CURIEs. CURIEs are shortform representations of URIs, and must be specified as PREFIX:LocalID, where the prefix has an associated URI base. The prefix must be declared in one of several ways:

  • a prefixes dictionary, where the keys are prefixes and the values are URI bases.

Example (Informative):

prefixes:
  biolinkml: https://w3id.org/biolink/biolinkml/
  wgs: http://www.w3.org/2003/01/geo/wgs84_pos#
  qud: http://qudt.org/1.1/schema/qudt#

The CURIE wgs:lat will exand to http://www.w3.org/2003/01/geo/wgs84_pos#lat.

Example (Informative):

default_curi_maps:
  - semweb_context
  • prefixes from public standard global namespaces used in the model (e.g. rdf) are indicated under the emit_prefixes section.

Example (Informative):

emit_prefixes:
  - rdf
  - rdfs
  - xsd
  - skos
  • a default prefix within a given schema is generally also declared by a value for the default_prefix tag:

Example (Informative):

default_prefix: ex

Schema Elements (Normative)

Imports (Normative)

Imports are specified as an import list in the main schema object. This specifies a set: the order of elements is not important.

imports:
  - <IMPORT_1>
  - <IMPORT_2>
  - ...
  - <IMPORT_n>  

Metadata elements (Normative)

As mentioned in the Introduction, semantic inheritance within a model is specified by several BiolinkML reserved properties:

  • is_a:
  • abstract:
  • mixin:
  • typeof:
  • subclass_of:
  • domain:
  • range:
  • subproperty_of:

A few fundamental rules guiding the use of these properties include:

  • range: the range of the mixins property in a class SHOULD be a mixin
  • homeomorphicity: is_a SHOULD only connect either (1) two mixins (2) two classes (3) two slots
  • instances MUST NOT instantiate a mixin slot or class directly since it has default abstract character; rather, it should be injected into non-abstract classes using the mixins property

Core elements: Classes, Slots, and Types (Normative)

Slots (Normative)

See SlotDefinition.

Slots are properties that can be assigned to classes.

The set of slots available in a model is defined in a slot dictionary, declared at the schema level

slots:
  SLOT_NAME_1: DEFINITION_1
  SLOT_NAME_2: DEFINITION_2
  ...
  SLOT_NAME_m: DEFINITION_n

Each key in the dictionary is the slot name. The slot name must be unique.

Class Slots (Normative)

The SlotDefinition is described in the metamodel.

Slot Hierarchies (Normative)

Each slot must have zero or one is_a parents, as defined by biolinkml:is_a

In addition a slot may have multiple mixin parents, as defined by biolinkml:mixins

We define function ancestors*(s) which is the transitive close of the union of slot s, the parents of slot s and defined by the union of is_a and mixins.

Classes and Class Slots (Normative)

In Biolink, as in the Web Ontology Language OWL class, is a classification of individuals into groups which share common characteristics. If an individual is a member of a class, it tells a machine reader that it falls under the semantic classification given by the class.

The set of classes available in a model is defined in a class dictionary, declared at the schema level. A class may have any number of slots declared.

classes:

  CLASS_NAME_1:
    slots:
      - CLASS_1_SLOT_NAME_1
      - CLASS_1_SLOT_NAME_2
      - ...
      - CLASS_1_SLOT_NAME_n
  ...
    CLASS_NAME_p:
    slots:
      - CLASS_p_SLOT_NAME_1
      - CLASS_p_SLOT_NAME_2
      - ...
      - CLASS_p_SLOT_NAME_q

Each declared slot must be defined in the slot dictionary. Note, however, that the use of declared slots in instances of a class are not mandatory unless slot_usage.required property of the slot is declared 'true' directly in that class or indirectly, by parental inheritance.

Class Hierarchies (Normative)

Each class must have zero or one is_a parents, as defined by biolinkml:is_a

In addition a class may have multiple mixin parents, as defined by biolinkml:mixins

We define function ancestors*(c) which is the transitive close of the union of class c, the parents of class c and defined by the union of is_a and mixins.

Slot Usages

Each class may declare a dictionary of slot_usages.

  CLASS:
    slot_usage:
      SLOT_1: USAGE_1
      SLOT_2: USAGE_2
      ...
      SLOT_n: USAGE_n

These refine a slot definition in the context of a particular class

def effective_slot_property(s, p, c):
   if c declares slot_usage for p:
      use the value for p
   elif any m in mixins(c) declares a slot_usage for p:
      use the value for p from this mixin
   else:
      if effective_slot_property(s, p, c.is_a):
         use this
      elif: effective_slot_property(s, p, m) for any mixin m:
         use this
      else:
         use the default value of p for s

Slot Validation

This section describes which slots can be used to describe which instances.

Consider instance i of type t, with an association of type s connecting to object or instance j.

{
  ## instance i of type t
  "s": <j>
}

For this to be valid, it must be the case that:

  • for all r

Domain Declarations

Built-in Types (Informative)

You can import standard types:

imports:
  - biolinkml:types

includes in its type dictionary entries such as:

  date:
    uri: xsd:date
    base: XSDDate
    repr: str

types.yaml

Glossary of terms (Information)