Supervised modelling, sequence generation, image generation, array output and survival analysis on genotype, tabular, sequence, image, array, and binary input data.
WARNING: This project is in alpha phase. Expect backwards incompatible changes and API changes between minor versions.
- Install
- Usage
- Use Cases
- Features
- Supported Inputs and Outputs
- Related Projects
- Citation
- Acknowledgements
pip install eir-dl
Important: The latest version of EIR requires Python 3.12. Using an older version of Python will install an outdated version of EIR, which will likely be incompatible with the current documentation and might contain bugs. Please ensure you are using Python 3.12.
Here's an example with Docker:
docker build -t eir:latest https://raw.githubusercontent.com/arnor-sigurdsson/EIR/master/Dockerfile
docker run -d --name eir_container eir:latest
docker exec -it eir_container bash
Please refer to the Documentation for examples and information.
EIR allows for training and evaluating various deep-learning models directly from the command line. This can be useful for:
- Quick prototyping and iteration when doing supervised modelling or sequence generation on new datasets.
- Establishing baselines to compare against other methods.
- Fitting on data sources such as large-scale genomics, where DL implementations are not commonly available.
If you are an ML/DL researcher developing new models, etc., it might not fit your use case. However, it might provide a quick baseline for comparison to the cool stuff you are developing, and there is some degree of customization possible.
-
General
- Train models directly from the command line through
.yaml
configuration files. - Training on genotype, tabular, sequence, image, array and binary input data, with various modality-specific settings available.
- Seamless multi-modal (e.g., combining text + image + tabular data, or any combination of the modalities above) training.
- Train multiple features extractors on the same data source, e.g., combining vanilla transformer, Longformer and a pre-trained BERT variant for text classification.
- Support for checkpointing and continued training, as well as pretraining and transferring parts of trained models to new tasks.
- Train models directly from the command line through
-
Supervised Learning
- Supports continuous (i.e., regression) and categorical (i.e., classification) targets.
- Multi-task / multi-label prediction supported out-of-the-box.
- Model explainability for genotype, tabular, sequence, image and array data built in.
- Computes and graphs various evaluation metrics (e.g., RMSE, PCC and R2 for regression tasks, accuracy, ROC-AUC, etc. for classification tasks) during training.
-
Sequence Generation
- Supports various sequence generation tasks, including basic sequence generation, sequence to sequence transformations, and image to sequence transformations. For more information, refer to the respective tutorials: sequence generation, sequence to sequence, image to sequence and tabular to sequence.
-
Image Generation
- Image generation is supported. For more information, refer to the respective tutorials: Building a Simple Image Autoencoder, Image Colorization and Super-Resolution and Guided Diffusion for Image Generation.
-
Array Output
- Supports array output tasks, such as building simple autoencoders for tasks like MNIST Digit Generation.
-
Time Series
- Time series inputs and outputs is possible, such as Transformer-based Power Consumption Prediction and Stock Price Prediction Using Transformers, One-shot and Diffusion Models.
-
Survival Analysis
- Time-to-event prediction is supported as an output type, demonstrated through Patient Survival Prediction using Free Light Chain Data and Survival Analysis Using Cox Proportional Hazards Model.
-
Many more settings and configurations (e.g., augmentation, regularization, optimizers) available.
Modality | Input | Output |
---|---|---|
Genotype | x | † |
Tabular | x | x |
Sequence | x | x |
Image | x | x |
Array | x | x |
Binary | x | |
Survival | n/a | x |
† While not directly supported, genotypes can be treated as arrays. For example see the MNIST Digit Generation tutorial.
- EIR-auto-GP: Automated genomic prediction (GP) using deep learning models with EIR.
If you use EIR
in a scientific publication, we would appreciate if you could use one of the following citations:
- Deep integrative models for large-scale human genomics
- Non-linear genetic regulation of the blood plasma proteome
- Improved prediction of blood biomarkers using deep learning
@article{10.1093/nar/gkad373,
author = {Sigurdsson, Arn{\'o}r I and Louloudis, Ioannis and Banasik, Karina and Westergaard, David and Winther, Ole and Lund, Ole and Ostrowski, Sisse Rye and Erikstrup, Christian and Pedersen, Ole Birger Vesterager and Nyegaard, Mette and DBDS Genomic Consortium and Brunak, S{\o}ren and Vilhj{\'a}lmsson, Bjarni J and Rasmussen, Simon},
title = {{Deep integrative models for large-scale human genomics}},
journal = {Nucleic Acids Research},
month = {05},
year = {2023}
}
@article{sigurdsson2024non,
title={Non-linear genetic regulation of the blood plasma proteome},
author={Sigurdsson, Arnor I and Gr{\"a}f, Justus F and Yang, Zhiyu and Ravn, Kirstine and Meisner, Jonas and Thielemann, Roman and Webel, Henry and Smit, Roelof AJ and Niu, Lili and Mann, Matthias and others},
journal={medRxiv},
pages={2024--07},
year={2024},
publisher={Cold Spring Harbor Laboratory Press}
}
@article{sigurdsson2022improved,
author = {Sigurdsson, Arnor Ingi and Ravn, Kirstine and Winther, Ole and Lund, Ole and Brunak, S{\o}ren and Vilhjalmsson, Bjarni J and Rasmussen, Simon},
title = {Improved prediction of blood biomarkers using deep learning},
journal = {medRxiv},
pages = {2022--10},
year = {2022},
publisher = {Cold Spring Harbor Laboratory Press}
}
Massive thanks to everyone publishing and developing the packages this project directly and indirectly depends on.