Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🧹 Clean-up #56

Merged
merged 31 commits into from
Nov 17, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
b4fa8b9
:fire: Delete unused training schemas
ri-heme Oct 26, 2022
46666c0
:fire: Delete deprecated model schema
ri-heme Oct 26, 2022
727c23c
:fire: Remove deprecated fields from schemas/configs
ri-heme Oct 26, 2022
0c0ede4
:fire: Delete training schema
ri-heme Oct 26, 2022
f3f818c
:fire: Remove deprecated modules
ri-heme Oct 26, 2022
82d3abf
:wrench: Configure isort
ri-heme Oct 26, 2022
5f9f07b
:art: Sort imports
ri-heme Oct 26, 2022
f74314a
:art: PEP8 Styling
ri-heme Oct 26, 2022
8f3c24b
:fire: Delete unused example config files
ri-heme Oct 26, 2022
1755b6f
:fire: Remove unused name field
ri-heme Oct 26, 2022
89a7b83
:wrench: Define default seed, add to schema
ri-heme Oct 26, 2022
186ee12
:rewind: Reinstate seed module
ri-heme Oct 26, 2022
64ed88b
:heavy_minus_sign: Remove dependencies
ri-heme Oct 26, 2022
820dc31
:memo: Type hinting, docstrings
ri-heme Oct 27, 2022
3fefdd5
:arrow_up: Set default Hydra version_base
ri-heme Oct 27, 2022
7e0e9ab
:sparkles: Update read config function for NBs
ri-heme Oct 27, 2022
6a59860
:bug: Fix dataloader creation
ri-heme Nov 7, 2022
0505f46
:bookmark: Update version number
ri-heme Nov 7, 2022
4aad974
:alembic: Resolve merge conflict
ri-heme Nov 7, 2022
c0e3150
:bug: Fix feature importance plot
ri-heme Nov 10, 2022
534bd94
:memo: Update tutorial NB num. 1
ri-heme Nov 9, 2022
b5a4eea
:memo: Update tutorial NB num. 4
ri-heme Nov 14, 2022
de29b6c
:memo: Update tutorial NB num. 04
ri-heme Nov 14, 2022
d733e3c
:rewind: Reinstate experiment config
ri-heme Nov 15, 2022
113d309
:fire: Remove unused tutorial files
ri-heme Nov 15, 2022
0b36f4c
:memo: Update tutorial NBs num. 2 & 5
ri-heme Nov 15, 2022
9c5e017
:twisted_rightwards_arrows: :memo: :art: :recycle: Set-up docs, refac…
valentas1 Nov 15, 2022
4123d2d
:memo: Update README
ri-heme Nov 17, 2022
d4bee66
:art: Sort imports
ri-heme Nov 17, 2022
dac748f
:bookmark: Update version number
ri-heme Nov 17, 2022
c69bd47
:twisted_rightwards_arrows: Fix merge conflict
ri-heme Nov 17, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -35,3 +35,11 @@ tutorial/interim_data/
tutorial/processed_data/
tutorial/results/
tutorial/maize/data

# Virtual environment
venv/
virtualvenv/

# docs files
docs/build/
docs/source/_templates/
79 changes: 56 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,18 @@
# MOVE (Multi-Omics Variational autoEncoder)

The code in this repository can be used to run our Multi-Omics Variational autoEncoder (MOVE) framework for integration of omics and clinical variabels spanning both categorial and continuous data. Our approach includes training ensemble VAE models and using in-silico perturbation experiments to identify cross omics associations. The manuscript has been Accepted and we will link when it is published.

We developed the method based on a Type 2 Diabetes cohort from the IMI DIRECT project containing 789 newly diagnosed T2D patients. The cohort and data creation is described in [Koivula et al.](https://dx.doi.org/10.1007%2Fs00125-019-4906-1) and [Wesolowska-Andersen and Brorsson et al.](https://doi.org/10.1016/j.xcrm.2021.100477). For the analysis we included the following data:
The code in this repository can be used to run our Multi-Omics Variational
autoEncoder (MOVE) framework for integration of omics and clinical variabels
spanning both categorial and continuous data. Our approach includes training
ensemble VAE models and using *in silico* perturbation experiments to identify
cross omics associations. The manuscript has been accepted and we will provide
the link when it is published.

We developed the method based on a Type 2 Diabetes cohort from the IMI DIRECT
project containing 789 newly diagnosed T2D patients. The cohort and data
creation is described in
[Koivula et al.](https://dx.doi.org/10.1007%2Fs00125-019-4906-1) and
[Wesolowska-Andersen et al.](https://doi.org/10.1016/j.xcrm.2021.100477). For
the analysis we included the following data:

Multi-omics data sets:
```
Expand All @@ -25,17 +35,21 @@ Medication data

## Installing MOVE package

MOVE is written in python and can therefore be installed using:
MOVE is written in Python and can therefore be installed using `pip`:

```
pip install move-dl
```bash
>>> pip install move-dl
```

## Requirements

MOVE runs on Mac, Windows and Linux using python. The variational autoencoder framework is implemented in pytorch, but everything should be installed for you using pip. The only exception to that is that if you want to use the jupyter notebooks you have to install jupyter yourself.
MOVE should run on any environmnet where Python is available. The variational
autoencoder architecture is implemented in PyTorch.

The training of the VAEs can be done using CPUs only or GPU acceleration. If you dont have powerful GPUs available it is perfectly fine to run using only CPUs. For instance, the tutorial data set consisting of simulated drug, metabolomics and proteomics data for 500 individuals runs fine on a standard macbook.
The training of the VAEs can be done using CPUs only or GPU acceleration. If
you do not have powerful GPUs available, it is possible to run using only CPUs.
For instance, the tutorial data set consisting of simulated drug, metabolomics
and proteomics data for 500 individuals runs fine on a standard macbook.

# The MOVE pipeline

Expand All @@ -47,35 +61,54 @@ MOVE has five-six steps:
03. Finding the right architecture of the network focusing on stability of the model
04. Use model, determined from steps 02-03, to create and analyze the latent space
05. Identify associations between a categorical and continuous datasets
05a. Using an ensemble of VAEs with the T-test approach
05a. Using an ensemble of VAEs with the t-test approach
05b. Using an ensemble of VAEs with the Bayesian decision theory approach
06. If both 5a and 5b were run select the overlap between them
```

## How to run MOVE

You can run the move-dl pipeline using the command line or using Jupyter notebooks. Notebooks with explanations are in the [tutorial](https://github.com/RasmussenLab/MOVE/tree/developer/tutorial) folder. Feel free to open an issue for help.
You can run the move-dl pipeline from the command line or within a Jupyter
notebook.

You can run MOVE as Python module with the following commands:
```
python -m move.01_encode_data
python -m move.02_optimize_reconstruction
python -m move.03_optimize_stability
python -m move.04_analyze_latent
python -m move.05_identify_associations
You can run MOVE as Python module with the following command. Details on how
to set up the configuration for the data and task can be found our
[tutorial](https://github.com/RasmussenLab/MOVE/tree/main/tutorial) folder.

```bash
>>> move-dl data=[name of data config] task=[name of task config]
```

Feel free to
[open an issue](https://github.com/RasmussenLab/MOVE/issues/new/choose) if you
need any help.

## How to use MOVE with your data
### How to use MOVE with your data

Your data files should be tab separated, include a header and the first column should be the IDs of your samples. The configuration of MOVE is done using yaml files that describe the input data (data.yaml), the model (model.yaml) and files associated with each of the steps (tuning_reconstruction.yaml, tuning_stability.yaml, training_latent.yaml, training_association.yaml). These should be placed in the working directory. Please see the [tutorial](https://github.com/RasmussenLab/MOVE/tree/developer/tutorial) for more information.
Your data files should be tab separated, include a header and the first column
should be the IDs of your samples. The configuration of MOVE is done using YAML
files that describe the input data and the task specification. These should be
placed in a `config` directory in the working directory. Please see the
[tutorial](https://github.com/RasmussenLab/MOVE/tree/main/tutorial)
for more information.


# Data sets

## DIRECT data set
The data used in notebooks are not available for testing due to the informed consent given by study participants, the various national ethical approvals for the study, and the European General Data Protection Regulation (GDPR). Therefore, individual-level clinical and omics data cannot be transferred from the centralized IMI-DIRECT repository. Requests for access to summary statistics IMI-DIRECT data, including those presented here, can be made to [email protected]. Requesters will be informed on how summary-level data can be accessed via the DIRECT secure analysis platform following submission of appropriate application. The IMI-DIRECT data access policy is available at [here](https://directdiabetes.org).

## Simulated and publicaly available data set
We have therefore added a simulated data set that can be used for testing the workflow and a publicly available maize rhizosphere microbiome data set. We have also included a notebook that goes through a short [tutorial](https://github.com/RasmussenLab/MOVE/tree/developer/tutorial) with a publicly-available maize rhizosphere microbiome dataset.

The data used in notebooks are not available for testing due to the informed
consent given by study participants, the various national ethical approvals for
the study, and the European General Data Protection Regulation (GDPR).
Therefore, individual-level clinical and omics data cannot be transferred from
the centralized IMI-DIRECT repository. Requests for access to summary statistics
IMI-DIRECT data, including those presented here, can be made to
[email protected]. Requesters will be informed on how summary-level
data can be accessed via the DIRECT secure analysis platform following
submission of appropriate application. The IMI-DIRECT data access policy is
available [here](https://directdiabetes.org).

## Simulated and publicaly available data sets

We have therefore provided two datasets to test the workflow: a simulated
dataset and a publicly-available maize rhizosphere microbiome data set.
20 changes: 20 additions & 0 deletions docs/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = source
BUILDDIR = build

# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
35 changes: 35 additions & 0 deletions docs/make.bat
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
@ECHO OFF

pushd %~dp0

REM Command file for Sphinx documentation

if "%SPHINXBUILD%" == "" (
set SPHINXBUILD=sphinx-build
)
set SOURCEDIR=source
set BUILDDIR=build

%SPHINXBUILD% >NUL 2>NUL
if errorlevel 9009 (
echo.
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
echo.installed, then set the SPHINXBUILD environment variable to point
echo.to the full path of the 'sphinx-build' executable. Alternatively you
echo.may add the Sphinx directory to PATH.
echo.
echo.If you don't have Sphinx installed, grab it from
echo.https://www.sphinx-doc.org/
exit /b 1
)

if "%1" == "" goto help

%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
goto end

:help
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%

:end
popd
2 changes: 2 additions & 0 deletions docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
sphinx==5.3.0
sphinx_rtd_theme=1.1.1
40 changes: 40 additions & 0 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# Configuration file for the Sphinx documentation builder.
#
# For the full list of built-in configuration values, see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html

# -- Project information -----------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information

import sys
from pathlib import Path

sys.path.insert(0, str(Path("../src").resolve()))

project = "move-dl"
copyright = "2022, Valentas Brasas, Ricardo Hernandez Medina"
author = "Valentas Brasas, Ricardo Hernandez Medina"
release = "1.0.0"

# -- General configuration ---------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration

extensions = [
"sphinx.ext.autodoc",
"sphinx.ext.autosummary",
"sphinx.ext.napoleon",
]

templates_path = ["_templates"]
exclude_patterns = []

# -- Options for HTML output -------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output

html_theme = "sphinx_rtd_theme"
html_static_path = []

# -- Napoleon settings --------------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/extensions/napoleon.html#configuration

napoleon_google_docstring = True
15 changes: 15 additions & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
.. move-dl documentation master file, created by
sphinx-quickstart on Sat Nov 5 15:48:56 2022.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.

Welcome to move-dl's documentation!
===================================

.. toctree::
:maxdepth: 1
:caption: Contents:

pages/installation
pages/tutorial
pages/api/API
9 changes: 9 additions & 0 deletions docs/source/pages/api/API.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
API
===

.. toctree::
:maxdepth: 2

configuration_schemas.rst
functions.rst
models.rst
2 changes: 2 additions & 0 deletions docs/source/pages/api/configuration_schemas.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Configuration schemas
=====================
2 changes: 2 additions & 0 deletions docs/source/pages/api/functions.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Functions
=========
2 changes: 2 additions & 0 deletions docs/source/pages/api/models.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Models
======
2 changes: 2 additions & 0 deletions docs/source/pages/installation.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Installation
============
2 changes: 2 additions & 0 deletions docs/source/pages/tutorial.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Tutorial(s)
============
4 changes: 4 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
[build-system]
requires = ["setuptools>=42", "wheel"]
build-backend = "setuptools.build_meta"

[tool.isort]
multi_line_output = 3
include_trailing_comma = true
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
hydra-core>=1.1.0
hydra-core>=1.2.0
numpy>=1.19.5
pandas>=1.1.5
torch==1.9.0
3 changes: 1 addition & 2 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,9 @@ install_requires =
numpy
pandas
torch
statsmodels
umap-learn
matplotlib
seaborn

package_dir =
= src
packages = find:
Expand Down
Empty file.
52 changes: 0 additions & 52 deletions src/move/01_encode_data/__main__.py

This file was deleted.

Empty file.
Loading