Skip to content

A toolkit for training deep learning models on genotype, tabular, sequence, image, array and binary data.

License

Notifications You must be signed in to change notification settings

arnor-sigurdsson/EIR

Repository files navigation

Documentation Status


Supervised modelling, sequence generation, image generation, array output and survival analysis on genotype, tabular, sequence, image, array, and binary input data.

WARNING: This project is in alpha phase. Expect backwards incompatible changes and API changes between minor versions.

Table of Contents

  1. Install
  2. Usage
  3. Use Cases
  4. Features
  5. Supported Inputs and Outputs
  6. Related Projects
  7. Citation
  8. Acknowledgements

Install

Installing EIR via pip

pip install eir-dl

Important: The latest version of EIR requires Python 3.12. Using an older version of Python will install an outdated version of EIR, which will likely be incompatible with the current documentation and might contain bugs. Please ensure you are using Python 3.12.

Installing EIR via Container Engine

Here's an example with Docker:

docker build -t eir:latest https://raw.githubusercontent.com/arnor-sigurdsson/EIR/master/Dockerfile
docker run -d --name eir_container eir:latest
docker exec -it eir_container bash

Usage

Please refer to the Documentation for examples and information.

Use Cases

EIR allows for training and evaluating various deep-learning models directly from the command line. This can be useful for:

  • Quick prototyping and iteration when doing supervised modelling or sequence generation on new datasets.
  • Establishing baselines to compare against other methods.
  • Fitting on data sources such as large-scale genomics, where DL implementations are not commonly available.

If you are an ML/DL researcher developing new models, etc., it might not fit your use case. However, it might provide a quick baseline for comparison to the cool stuff you are developing, and there is some degree of customization possible.

Features

Supported Inputs and Outputs

Modality Input Output
Genotype x
Tabular x x
Sequence x x
Image x x
Array x x
Binary x
Survival n/a x

† While not directly supported, genotypes can be treated as arrays. For example see the MNIST Digit Generation tutorial.

Related Projects

  • EIR-auto-GP: Automated genomic prediction (GP) using deep learning models with EIR.

Citation

If you use EIR in a scientific publication, we would appreciate if you could use one of the following citations:

@article{10.1093/nar/gkad373,
    author    = {Sigurdsson, Arn{\'o}r I and Louloudis, Ioannis and Banasik, Karina and Westergaard, David and Winther, Ole and Lund, Ole and Ostrowski, Sisse Rye and Erikstrup, Christian and Pedersen, Ole Birger Vesterager and Nyegaard, Mette and DBDS Genomic Consortium and Brunak, S{\o}ren and Vilhj{\'a}lmsson, Bjarni J and Rasmussen, Simon},
    title     = {{Deep integrative models for large-scale human genomics}},
    journal   = {Nucleic Acids Research},
    month     = {05},
    year      = {2023}
}

@article{sigurdsson2024non,
  title={Non-linear genetic regulation of the blood plasma proteome},
  author={Sigurdsson, Arnor I and Gr{\"a}f, Justus F and Yang, Zhiyu and Ravn, Kirstine and Meisner, Jonas and Thielemann, Roman and Webel, Henry and Smit, Roelof AJ and Niu, Lili and Mann, Matthias and others},
  journal={medRxiv},
  pages={2024--07},
  year={2024},
  publisher={Cold Spring Harbor Laboratory Press}
}

@article{sigurdsson2022improved,
    author    = {Sigurdsson, Arnor Ingi and Ravn, Kirstine and Winther, Ole and Lund, Ole and Brunak, S{\o}ren and Vilhjalmsson, Bjarni J and Rasmussen, Simon},
    title     = {Improved prediction of blood biomarkers using deep learning},
    journal   = {medRxiv},
    pages     = {2022--10},
    year      = {2022},
    publisher = {Cold Spring Harbor Laboratory Press}
}

Acknowledgements

Massive thanks to everyone publishing and developing the packages this project directly and indirectly depends on.