Supervised modelling, sequence generation, and array output on genotype, tabular, sequence, image, array, and binary input data.
WARNING: This project is in alpha phase. Expect backwards incompatible changes and API changes.
- Install
- Usage
- Use Cases
- Features
- Supported Inputs and Outputs
- Related Projects
- Citation
- Acknowledgements
pip install eir-dl
Important: The latest version of EIR supports Python 3.11. Using an older version of Python will install a outdated version of EIR, which likely be incompatible with the current documentation and might contain bugs. Please ensure that you are installing EIR in a Python 3.11 environment.
Here's an example with Docker:
docker build -t eir:latest https://raw.githubusercontent.com/arnor-sigurdsson/EIR/master/Dockerfile
docker run -d --name eir_container eir:latest
docker exec -it eir_container bash
Please refer to the Documentation for examples and information.
EIR allows for training and evaluating various deep-learning models directly from the command line. This can be useful for:
- Quick prototyping and iteration when doing supervised modelling or sequence generation on new datasets.
- Establishing baselines to compare against other methods.
- Fitting on data sources such as large-scale genomics, where DL implementations are not commonly available.
If you are an ML/DL researcher developing new models, etc., it might not fit your use case. However, it might provide a quick baseline for comparison to the cool stuff you are developing, and there is some degree of customization possible.
-
General
- Train models directly from the command line through
.yaml
configuration files. - Training on genotype, tabular, sequence, image, array and binary input data, with various modality-specific settings available.
- Seamless multi-modal (e.g., combining text + image + tabular data, or any combination of the modalities above) training.
- Train multiple features extractors on the same data source, e.g., combining vanilla transformer, Longformer and a pre-trained BERT variant for text classification.
- Support for checkpointing and continued training, as well as pretraining and transferring parts of trained models to new tasks.
- Train models directly from the command line through
-
Supervised Learning
- Supports continuous (i.e., regression) and categorical (i.e., classification) targets.
- Multi-task / multi-label prediction supported out-of-the-box.
- Model explainability for genotype, tabular, sequence, image and array data built in.
- Computes and graphs various evaluation metrics (e.g., RMSE, PCC and R2 for regression tasks, accuracy, ROC-AUC, etc. for classification tasks) during training.
-
Sequence generation
- Supports various sequence generation tasks, including basic sequence generation, sequence to sequence transformations, and image to sequence transformations. For more information, refer to the respective tutorials: sequence generation, sequence to sequence, image to sequence and tabular to sequence.
-
Array Output
- Supports array output tasks, such as building simple autoencoders for tasks like MNIST Digit Generation.
-
Many more settings and configurations (e.g., augmentation, regularization, optimizers) available.
Modality | Input | Output |
---|---|---|
Genotype | x | † |
Tabular | x | x |
Sequence | x | x |
Image | x | † |
Array | x | x |
Binary | x |
† While not directly supported, genotype and image modalities can be treated as arrays. For example see the MNIST Digit Generation tutorial.
- EIR-auto-GP: Automated genomic prediction (GP) using deep learning models with EIR.
If you use EIR
in a scientific publication, we would appreciate if you could use one of the following citations:
@article{10.1093/nar/gkad373,
author = {Sigurdsson, Arn{\'o}r I and Louloudis, Ioannis and Banasik, Karina and Westergaard, David and Winther, Ole and Lund, Ole and Ostrowski, Sisse Rye and Erikstrup, Christian and Pedersen, Ole Birger Vesterager and Nyegaard, Mette and DBDS Genomic Consortium and Brunak, S{\o}ren and Vilhj{\'a}lmsson, Bjarni J and Rasmussen, Simon},
title = {{Deep integrative models for large-scale human genomics}},
journal = {Nucleic Acids Research},
month = {05},
year = {2023}
}
@article{sigurdsson2022improved,
author = {Sigurdsson, Arnor Ingi and Ravn, Kirstine and Winther, Ole and Lund, Ole and Brunak, S{\o}ren and Vilhjalmsson, Bjarni J and Rasmussen, Simon},
title = {Improved prediction of blood biomarkers using deep learning},
journal = {medRxiv},
pages = {2022--10},
year = {2022},
publisher = {Cold Spring Harbor Laboratory Press}
}
Massive thanks to everyone publishing and developing the packages this project directly and indirectly depends on.