CFDE Data Distillery Project

UBKG

The Unified Biomedical Knowledge Graph (UBKG) is a knowledge graph database that represents a set of interrelated concepts from biomedical ontologies and vocabularies. The UBKG combines information from the National Library of Medicine's Unified Medical Language System (UMLS) with assertions from “non-UMLS” ontologies or vocabularies, including:

Ontologies published in references such as the NCBO Bioportal and the OBO Foundry.
Custom ontologies derived from data sources such as UNIPROTKB.
Other custom ontologies, such as those for the HuBMAP platform.

An important goal of the UBKG is to establish connections between ontologies. For example,if information on the relationships between proteins and genes described in one ontology can be connected to information on the relationships between genes and diseases described in another ontology, it may be possible to identify previously unknown relationships between proteins and diseases.

Components and generation frameworks

The primary components of the UBKG are:

a graph database, deployed in neo4j
a REST API that provides access to the information in the graph database

The UBKG database is populated from the load of a set of CSV files, using [neo4j-admin import] (https://neo4j.com/docs/operations-manual/current/tutorial/neo4j-admin-import/). The set of CSV import files is the product of two generation frameworks.

UBKG API

The UBKG prohibits direct Cypher access to the neo4j knowledge graph database. The UBKG API is a REST API with endpoints that can be used to return information from the UBKG.

The UBKG API is described in this SmartAPI page.

Source framework

The source framework is a combination of manual and automated processes that obtain the base set of nodes (entities) and edges (relationships) of the UBKG graph.

The source framework is also known as the UMLS-Graph.

Information on the concepts in the ontologies and vocabularies that are integrated into the UMLS Metathesaurus can be downloaded using the MetamorphoSys application. MetamorphoSys can be configured to download subsets of the entire UMLS.
Additional semantic information related to the UMLS can be downloaded manually from the Semantic Network.

The result of the Metathesaurus and Semantic Network downloads is a set of files in Rich Release Format (RRF). The RRF files contain information on source vocabularies or ontologies, codes, terms, and relationships both with other codes in the same vocabularies and with UMLS concepts.

The RRF files are loaded into a data mart. A python script then executes SQL scripts that perform Extraction, Transformation, and Loading of the RRF data into a set of twelve temporary tables. These tables are exported to CSV format in files that become the UMLS CSVs.

Generation framework

The UMLS CSVs can be loaded into neo4j to build a graph version of the UMLS, including concepts and relationships from over 150 vocabularies and ontologies that are integrated into the UMLS, such as SNOMED CT, ICD10, NCI, etc..

The UBKG extends the UMLS graph by integrating additional concepts and relationships from sources outside of the UMLS, including a number of standard biomedical ontologies that are published in NCBO BioPortal, including:

Ontology or Source	Description
PATO	Phenotypic Quality Ontology
UBERON	Uber Anatomy Ontology
CL	Cell Ontology
DOID	Human Disease Ontology
OBI	Ontology for Biomedical Investigations
EDAM	EDAM
HSAPDV	Human Developmental Stages Ontology
SBO	Systems Biology Ontology
MI	Molecular Interactions
CHEBI	Chemical Entities of Biological Interest Ontology
MP	Mammalian Phenotype Ontology
ORDO	Orphan Rare Disease Ontology
UO	Units of Measurement Ontology
UNIPROTKB	Protein-gene relationships from UniProtKB
HUSAT	HuBMAP Samples Added Terms
HUBMAP	the application ontology supporting the infrastructure of the HuBMAP Consortium
CCF	Human Reference Atlas Common Coordinate Framework Ontology
MONDO	MONDO Disease Ontology
EFO	Experimental Factor Ontology
SENNET	the application ontology supporting the infrastructure of the SenNet Consortium

The generation framework is a suite of scripts that:

extract information on assertions (also known as triples, or subject-predicate-object relationships) found in ontologies or derived from other sources
iteratively add assertion information to the base set of UMLS CSVs to create a set of ontology CSVs.

Once a set of ontology CSVs is ready, they can be imported into a neo4j database to form a new instance of the UBKG.

The generation framework can work with:

data from ontologies published in Web Ontology Language (OWL) files that conform to the principles of the OBO Foundry
data from private or custom ontologies that are in the SimpleKnowledge format. (SimpleKnowledge is a lightweight ontology editor based on spreadsheets developed by Pitt UBMI.)
assertion data that conforms to the UBKG Edge/Node format.

PheKnowLator and OWLNETS

The generation framework obtains assertion data from OWL files with scripts that are based on the Phenotype Knowledge Translator (PheKnowLator) application. PheKnowLator converts information from an OWL file into the OWL-NETS (OWL NEtwork Transformation for Statistical learning) format.

Name		Name	Last commit message	Last commit date
Latest commit History 369 Commits
DCC_schemas		DCC_schemas
DCC_use_cases		DCC_use_cases
DCC_workflows		DCC_workflows
api		api
images		images
neo4j		neo4j
scripts		scripts
source_framework		source_framework
user_guide		user_guide
.gitignore		.gitignore
CFDE_DataDistillery_Tutorial_Outline.md		CFDE_DataDistillery_Tutorial_Outline.md
LICENSE		LICENSE
README.md		README.md
VERSION		VERSION
docker-compose.deployment.data-distillery.api.yml		docker-compose.deployment.data-distillery.api.yml
docker-compose.deployment.data-distillery.neo4j.yml		docker-compose.deployment.data-distillery.neo4j.yml
docker-compose.deployment.hubmap.api.yml		docker-compose.deployment.hubmap.api.yml
docker-compose.deployment.hubmap.neo4j.yml		docker-compose.deployment.hubmap.neo4j.yml
docker-compose.localhost.yml		docker-compose.localhost.yml
license.txt		license.txt
ubkg-api-spec.yaml		ubkg-api-spec.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CFDE Data Distillery Project

UBKG

Components and generation frameworks

UBKG API

Source framework

Generation framework

PheKnowLator and OWLNETS

About

Releases

Packages

Contributors 7

Languages

License

TaylorResearchLab/CFDE_DataDistillery

Folders and files

Latest commit

History

Repository files navigation

CFDE Data Distillery Project

UBKG

Components and generation frameworks

UBKG API

Source framework

Generation framework

PheKnowLator and OWLNETS

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 7

Languages

Packages