Skip to content

An End-to-End Graph Neural Network for Disease Gene Prioritization

Notifications You must be signed in to change notification settings

samurous/DGP_end_to_end_CGN

Repository files navigation

Supplementary repository for the research on An End-to-End Graph Neural Network for Disease Gene Prioritization

This repository includes

  • All code needed to reproduce the experiments.
  • Instructions to setup the used environment to run the experiments.
  • The data sources needed as inputs for the experiments.
  • The evaluation results.
  • The pre trained models.

The process of the experiments is ducomented in

Setup

Preliminaries

The experiments have been performed using python 3.7 and this hardware:

  • AMD Ryzen 7 2700X Eight-Core Processor
  • 34 GB RAM
  • GeForce RTX 2080 Ti

Setup python environment.

Using Conda

conda create --name dpg_gnn python=3.7
conda install -y -q --name dpg_gnn -c conda-forge --file requirements.txt
conda activate dpg_gnn

Using Virtualenv + pip

virtualenv dpg_gnn -p `which python3.7`
source dpg_gnn/bin/activate
pip install -r requirements.txt

Data sources

Description:
HumanNet v2: human gene networks for disease research.

Source:
inetbio.org/humannet

Description:
Names and OMIM Ids of all diseases covered in this experiment.

Source:

Description:
Human Phenotype Ontology Annotations associated to diseases via OMIM ids.

Source:
Human Phenotype Ontology

Description:
Titles and abstracts of publications associated to diseases identified via OMIM ids.

Source:
NCBI pubmed

Description:
Gene expression conditions extracted from human gene expression atlas of 5372 samples representing 369 different cell and tissue types, disease states and cell lines.

Source:
https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-62/

Description:
Human Phenotype Ontology Annotations associated to genes via NCBI entrez gene ids.

Source:
Human Phenotype Ontology

Description:
Gene disease class associations as created by the Human Disease Network

source

Description:
Gene ontology terms associated to genes by entrez gene ids.

Example:

ID  Parent      Evidence code   GO term
18  GO:0001666  IEA             GO:0009628

Means Gene 18 has term GO:0009628 where GO:0001666 is the parent end the GO evidence code is IEA (Inferred from Electronic Annotation).

Source:
The Gene Ontology Resource

Description:
Genes associated to diseases (Entrez gene id, OMIM disease id).

Source:

Description:
Genes associated to diseases (Entrez gene id, OMIM disease id).

Source:

Source:

Content

.
|-- DiseaseNet.py
|-- GeneNet.py
|-- README.md
|-- TheModel.py
|-- data_sources
|   |-- CTD_chemicals_diseases.tsv.gz
|   |-- HumanNet-FN.tsv
|   |-- HumanNet-XN.tsv
|   |-- all_diseases.tsv
|   |-- disease_hpo.tsv
|   |-- disease_net_pubmed_knn
|   |   |-- processed
|   |   |   |-- data.pt
|   |   |   |-- disease_id_feature_index_mapping.txt
|   |   |   |-- edges.pt
|   |   |   |-- nodes.pt
|   |   |   |-- pre_filter.pt
|   |   |   `-- pre_transform.pt
|   |   `-- raw
|   |       |-- CTD_chemicals_diseases.tsv.gz
|   |       |-- all_diseases.tsv
|   |       |-- disease_hpo.tsv
|   |       |-- disease_pathway.tsv
|   |       `-- disease_publication_titles_and_abstracts.tsv
|   |-- disease_pathway.tsv
|   |-- disease_publication_titles_and_abstracts.tsv
|   |-- extracted_disease_class_assignments.tsv
|   |-- gene_expressions.tsv
|   |-- gene_gtex_rna_seq_expressions.tsv
|   |-- gene_hpo_disease.tsv
|   |-- gene_net_fn_hpo
|   |   |-- processed
|   |   |   |-- data.pt
|   |   |   |-- edges.pt
|   |   |   |-- gene_id_data_index.tsv
|   |   |   |-- nodes.pt
|   |   |   |-- pre_filter.pt
|   |   |   `-- pre_transform.pt
|   |   `-- raw
|   |       |-- HumanNet-FN.tsv
|   |       |-- gene_expressions.tsv
|   |       |-- gene_gtex_rna_seq_expressions.tsv
|   |       |-- gene_hpo_disease.tsv
|   |       |-- gene_ontologies.tsv
|   |       `-- gene_pathway_associations.tsv
|   |-- gene_ontologies.tsv
|   |-- gene_pathway_associations.tsv
|   |-- genes_diseases.tsv
|   `-- genes_diseases_mgi_only.tsv
|-- experiments
|   |-- disease_gene_classification.ipynb
|   |-- disease_gene_prioritization.ipynb
|   `-- results
|       `-- final
|           |-- Disease_gene_prediction_ROC_by_fold_monogenic_diseases.pdf
|           |-- Disease_gene_prediction_ROC_by_fold_multigenic_diseases.pdf
|           |-- Disease_gene_prediction_ROC_combined.pdf
|           |-- disease_classification_results.gz
|           |-- disease_gene_classification_result_bar_chart_Pr-auc.pdf
|           |-- disease_gene_classification_result_bar_chart_ROC-auc.pdf
|           |-- disease_gene_classification_result_bar_chart_ROC-auc_Pr-auc_fmax.pdf
|           |-- disease_gene_classification_result_bar_chart_fmax.pdf
|           |-- final_hyperparameters_dis_dict.pickle.gz
|           |-- final_hyperparameters_metrics.pickle.gz
|           |-- model_fold_1.ptm
|           |-- model_fold_2.ptm
|           |-- model_fold_3.ptm
|           |-- model_fold_4.ptm
|           `-- model_fold_5.ptm
`-- requirements.txt

About

An End-to-End Graph Neural Network for Disease Gene Prioritization

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published