RasmussenLab · ri-heme · Nov 17, 2022 · Oct 26, 2022 · Oct 26, 2022 · Oct 26, 2022
diff --git a/.gitignore b/.gitignore
@@ -35,3 +35,11 @@ tutorial/interim_data/
 tutorial/processed_data/
 tutorial/results/
 tutorial/maize/data
+
+# Virtual environment
+venv/
+virtualvenv/
+
+# docs files
+docs/build/
+docs/source/_templates/
diff --git a/README.md b/README.md
@@ -1,8 +1,18 @@
 # MOVE (Multi-Omics Variational autoEncoder)
 
-The code in this repository can be used to run our Multi-Omics Variational autoEncoder (MOVE) framework for integration of omics and clinical variabels spanning both categorial and continuous data. Our approach includes training ensemble VAE models and using in-silico perturbation experiments to identify cross omics associations. The manuscript has been Accepted and we will link when it is published.
-
-We developed the method based on a Type 2 Diabetes cohort from the IMI DIRECT project containing 789 newly diagnosed T2D patients. The cohort and data creation is described in [Koivula et al.](https://dx.doi.org/10.1007%2Fs00125-019-4906-1) and [Wesolowska-Andersen and Brorsson et al.](https://doi.org/10.1016/j.xcrm.2021.100477). For the analysis we included the following data:
+The code in this repository can be used to run our Multi-Omics Variational
+autoEncoder (MOVE) framework for integration of omics and clinical variabels
+spanning both categorial and continuous data. Our approach includes training
+ensemble VAE models and using *in silico* perturbation experiments to identify
+cross omics associations. The manuscript has been accepted and we will provide 
+the link when it is published.
+
+We developed the method based on a Type 2 Diabetes cohort from the IMI DIRECT
+project containing 789 newly diagnosed T2D patients. The cohort and data
+creation is described in
+[Koivula et al.](https://dx.doi.org/10.1007%2Fs00125-019-4906-1) and
+[Wesolowska-Andersen et al.](https://doi.org/10.1016/j.xcrm.2021.100477). For
+the analysis we included the following data:
 
 Multi-omics data sets:
 ```
@@ -25,17 +35,21 @@ Medication data
 
 ## Installing MOVE package
 
-MOVE is written in python and can therefore be installed using:
+MOVE is written in Python and can therefore be installed using `pip`:
 
-```
-pip install move-dl
+```bash
+>>> pip install move-dl
 ```
 
 ## Requirements
 
-MOVE runs on Mac, Windows and Linux using python. The variational autoencoder framework is implemented in pytorch, but everything should be installed for you using pip. The only exception to that is that if you want to use the jupyter notebooks you have to install jupyter yourself.
+MOVE should run on any environmnet where Python is available. The variational
+autoencoder architecture is implemented in PyTorch.
 
-The training of the VAEs can be done using CPUs only or GPU acceleration. If you dont have powerful GPUs available it is perfectly fine to run using only CPUs. For instance, the tutorial data set consisting of simulated drug, metabolomics and proteomics data for 500 individuals runs fine on a standard macbook.
+The training of the VAEs can be done using CPUs only or GPU acceleration. If
+you do not have powerful GPUs available, it is possible to run using only CPUs.
+For instance, the tutorial data set consisting of simulated drug, metabolomics
+and proteomics data for 500 individuals runs fine on a standard macbook.
 
 # The MOVE pipeline
 
@@ -47,35 +61,54 @@ MOVE has five-six steps:
 03. Finding the right architecture of the network focusing on stability of the model
 04. Use model, determined from steps 02-03, to create and analyze the latent space
 05. Identify associations between a categorical and continuous datasets
-05a. Using an ensemble of VAEs with the T-test approach
+05a. Using an ensemble of VAEs with the t-test approach
 05b. Using an ensemble of VAEs with the Bayesian decision theory approach
 06. If both 5a and 5b were run select the overlap between them
 ```
 
 ## How to run MOVE
 
-You can run the move-dl pipeline using the command line or using Jupyter notebooks. Notebooks with explanations are in the [tutorial](https://github.com/RasmussenLab/MOVE/tree/developer/tutorial) folder. Feel free to open an issue for help.
+You can run the move-dl pipeline from the command line or within a Jupyter
+notebook.
 
-You can run MOVE as Python module with the following commands: 
-```
-python -m move.01_encode_data 
-python -m move.02_optimize_reconstruction
-python -m move.03_optimize_stability
-python -m move.04_analyze_latent
-python -m move.05_identify_associations
+You can run MOVE as Python module with the following command. Details on how
+to set up the configuration for the data and task can be found our
+[tutorial](https://github.com/RasmussenLab/MOVE/tree/main/tutorial) folder.
+
+```bash
+>>> move-dl data=[name of data config] task=[name of task config]
 ```
 
+Feel free to
+[open an issue](https://github.com/RasmussenLab/MOVE/issues/new/choose) if you
+need any help.
 
-## How to use MOVE with your data
+### How to use MOVE with your data
 
-Your data files should be tab separated, include a header and the first column should be the IDs of your samples. The configuration of MOVE is done using yaml files that describe the input data (data.yaml), the model (model.yaml) and files associated with each of the steps (tuning_reconstruction.yaml, tuning_stability.yaml, training_latent.yaml, training_association.yaml). These should be placed in the working directory. Please see the [tutorial](https://github.com/RasmussenLab/MOVE/tree/developer/tutorial) for more information.
+Your data files should be tab separated, include a header and the first column
+should be the IDs of your samples. The configuration of MOVE is done using YAML
+files that describe the input data and the task specification. These should be
+placed in a `config` directory in the working directory. Please see the
+[tutorial](https://github.com/RasmussenLab/MOVE/tree/main/tutorial)
+for more information.
 
 
 # Data sets
 
 ## DIRECT data set
-The data used in notebooks are not available for testing due to the informed consent given by study participants, the various national ethical approvals for the study, and the European General Data Protection Regulation (GDPR). Therefore, individual-level clinical and omics data cannot be transferred from the centralized IMI-DIRECT repository. Requests for access to summary statistics IMI-DIRECT data, including those presented here, can be made to [email protected]. Requesters will be informed on how summary-level data can be accessed via the DIRECT secure analysis platform following submission of appropriate application. The IMI-DIRECT data access policy is available at [here](https://directdiabetes.org).
-
-## Simulated and publicaly available data set
-We have therefore added a simulated data set that can be used for testing the workflow and a publicly available maize rhizosphere microbiome data set. We have also included a notebook that goes through a short [tutorial](https://github.com/RasmussenLab/MOVE/tree/developer/tutorial) with a publicly-available maize rhizosphere microbiome dataset.
 
+The data used in notebooks are not available for testing due to the informed
+consent given by study participants, the various national ethical approvals for
+the study, and the European General Data Protection Regulation (GDPR).
+Therefore, individual-level clinical and omics data cannot be transferred from
+the centralized IMI-DIRECT repository. Requests for access to summary statistics
+IMI-DIRECT data, including those presented here, can be made to
+[email protected]. Requesters will be informed on how summary-level
+data can be accessed via the DIRECT secure analysis platform following
+submission of appropriate application. The IMI-DIRECT data access policy is
+available [here](https://directdiabetes.org).
+
+## Simulated and publicaly available data sets
+
+We have therefore provided two datasets to test the workflow: a simulated 
+dataset and a publicly-available maize rhizosphere microbiome data set.
diff --git a/docs/Makefile b/docs/Makefile
@@ -0,0 +1,20 @@
+# Minimal makefile for Sphinx documentation
+#
+
+# You can set these variables from the command line, and also
+# from the environment for the first two.
+SPHINXOPTS    ?=
+SPHINXBUILD   ?= sphinx-build
+SOURCEDIR     = source
+BUILDDIR      = build
+
+# Put it first so that "make" without argument is like "make help".
+help:
+	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
+
+.PHONY: help Makefile
+
+# Catch-all target: route all unknown targets to Sphinx using the new
+# "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
+%: Makefile
+	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
diff --git a/docs/make.bat b/docs/make.bat
@@ -0,0 +1,35 @@
+@ECHO OFF
+
+pushd %~dp0
+
+REM Command file for Sphinx documentation
+
+if "%SPHINXBUILD%" == "" (
+	set SPHINXBUILD=sphinx-build
+)
+set SOURCEDIR=source
+set BUILDDIR=build
+
+%SPHINXBUILD% >NUL 2>NUL
+if errorlevel 9009 (
+	echo.
+	echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
+	echo.installed, then set the SPHINXBUILD environment variable to point
+	echo.to the full path of the 'sphinx-build' executable. Alternatively you
+	echo.may add the Sphinx directory to PATH.
+	echo.
+	echo.If you don't have Sphinx installed, grab it from
+	echo.https://www.sphinx-doc.org/
+	exit /b 1
+)
+
+if "%1" == "" goto help
+
+%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
+goto end
+
+:help
+%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
+
+:end
+popd
diff --git a/docs/requirements.txt b/docs/requirements.txt
@@ -0,0 +1,2 @@
+sphinx==5.3.0
+sphinx_rtd_theme=1.1.1
diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -0,0 +1,40 @@
+# Configuration file for the Sphinx documentation builder.
+#
+# For the full list of built-in configuration values, see the documentation:
+# https://www.sphinx-doc.org/en/master/usage/configuration.html
+
+# -- Project information -----------------------------------------------------
+# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information
+
+import sys
+from pathlib import Path
+
+sys.path.insert(0, str(Path("../src").resolve()))
+
+project = "move-dl"
+copyright = "2022, Valentas Brasas, Ricardo Hernandez Medina"
+author = "Valentas Brasas, Ricardo Hernandez Medina"
+release = "1.0.0"
+
+# -- General configuration ---------------------------------------------------
+# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
+
+extensions = [
+    "sphinx.ext.autodoc",
+    "sphinx.ext.autosummary",
+    "sphinx.ext.napoleon",
+]
+
+templates_path = ["_templates"]
+exclude_patterns = []
+
+# -- Options for HTML output -------------------------------------------------
+# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output
+
+html_theme = "sphinx_rtd_theme"
+html_static_path = []
+
+# -- Napoleon settings --------------------------------------------------------
+# https://www.sphinx-doc.org/en/master/usage/extensions/napoleon.html#configuration
+
+napoleon_google_docstring = True
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -0,0 +1,15 @@
+.. move-dl documentation master file, created by
+   sphinx-quickstart on Sat Nov  5 15:48:56 2022.
+   You can adapt this file completely to your liking, but it should at least
+   contain the root `toctree` directive.
+
+Welcome to move-dl's documentation!
+===================================
+
+.. toctree::
+   :maxdepth: 1
+   :caption: Contents:
+
+   pages/installation
+   pages/tutorial
+   pages/api/API
diff --git a/docs/source/pages/api/API.rst b/docs/source/pages/api/API.rst
@@ -0,0 +1,9 @@
+API
+===
+
+.. toctree::
+    :maxdepth: 2
+
+    configuration_schemas.rst
+    functions.rst
+    models.rst
diff --git a/docs/source/pages/api/configuration_schemas.rst b/docs/source/pages/api/configuration_schemas.rst
@@ -0,0 +1,2 @@
+Configuration schemas
+=====================
diff --git a/docs/source/pages/api/functions.rst b/docs/source/pages/api/functions.rst
@@ -0,0 +1,2 @@
+Functions
+=========
diff --git a/docs/source/pages/api/models.rst b/docs/source/pages/api/models.rst
@@ -0,0 +1,2 @@
+Models
+======
diff --git a/docs/source/pages/installation.rst b/docs/source/pages/installation.rst
@@ -0,0 +1,2 @@
+Installation
+============
diff --git a/docs/source/pages/tutorial.rst b/docs/source/pages/tutorial.rst
@@ -0,0 +1,2 @@
+Tutorial(s)
+============
diff --git a/pyproject.toml b/pyproject.toml
@@ -1,3 +1,7 @@
 [build-system]
 requires = ["setuptools>=42", "wheel"]
 build-backend = "setuptools.build_meta"
+
+[tool.isort]
+multi_line_output = 3
+include_trailing_comma = true
diff --git a/requirements.txt b/requirements.txt
@@ -1,4 +1,4 @@
-hydra-core>=1.1.0
+hydra-core>=1.2.0
 numpy>=1.19.5
 pandas>=1.1.5
 torch==1.9.0
diff --git a/setup.cfg b/setup.cfg
@@ -18,10 +18,9 @@ install_requires =
     numpy
     pandas
     torch
-    statsmodels
-    umap-learn
     matplotlib
     seaborn
+
 package_dir =
     = src
 packages = find:

diff --git a/src/move/01_encode_data/__init__.py b/src/move/01_encode_data/__init__.py
diff --git a/src/move/01_encode_data/__main__.py b/src/move/01_encode_data/__main__.py
diff --git a/src/move/02_optimize_reconstruction/__init__.py b/src/move/02_optimize_reconstruction/__init__.py