Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test tutorial #1

Merged
merged 11 commits into from
May 30, 2024
45 changes: 45 additions & 0 deletions .github/workflows/test_tutorial.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
name: Test tutorial and publish website

on:
- push

jobs:
test_tutorial:
name: Check tutorial
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- uses: actions/setup-python@v5
with:
python-version: "3.11"

- name: Install Claster requirements
run: |
pip install numpy pandas matplotlib jupyter ipykernel eir-dl
- name: Run papermill for cmd-line execution of notebooks
run: |
pip install papermill
- name: Test Tutorial
run: |
cd scripts
papermill 0_Tutorial.ipynb 0_Tutorial_out.ipynb -p epochs 4
website:
name: Publish notebooks as website
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- name: Install dependencies
run: |
pip install sphinx sphinx-book-theme myst-nb
- name: Build website
run: |
sphinx-build -n --keep-going -b html ./ ./_build/
- name: Publish workflow as website
uses: peaceiris/actions-gh-pages@v4
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
publish_dir: _build
21 changes: 9 additions & 12 deletions Readme.md
Original file line number Diff line number Diff line change
@@ -1,31 +1,31 @@
# <center>CLASTER
## <center> Modeling nascent RNA transcription from chromatin landscape and structure <center>

**Abstract**
## Abstract

_Different cell types and their associated functionalities can emerge from a single genomic sequence when certain regions are expressed while others remain silenced. The study of gene regulation and its potential malfunctioning in different cellular contexts is hence pivotal to understand both development and disease. We present the Chromatin Landscape and Structure to Expression Regressor (CLASTER), an epigenetic-based deep neural network that can integrate different data modalities describing the chromatin landscape and its 3D structure in their raw format. CLASTER effectively translates them into nascent transcription levels measured by EU-seq at a kilobasepair resolution. Our predictions reached a Pearson correlation with targets above r=0.86 at both bin and gene levels, without relying on DNA sequence nor explicitly extracted chromatin features. The model mostly used the information found within 10 kbp of the predicted locus to perform the predictions, even when a wide genomic region of 1 Mbp was available. Explicit modeling of long-range interactions using multi-headed attention and high-resolution chromatin contact maps had little impact on model performance, despite the model correctly identifying elements in these inputs influencing nascent transcription. The trained model then served as a platform to predict the transcriptional impact of simulated epigenetic silencing perturbations. Our results point towards a rather local, integrative and combinatorial paradigm of gene regulation, where changes in the chromatin environment surrounding a gene shape its context-specific transcription. We conclude that the predominant locality and limitations of current machine learning approaches might emerge as a genuine signature of genomic organization, having broad implications for future modeling approaches._

![Claster image](./images/Claster_image.png)
![Claster image](https://raw.githubusercontent.com/RasmussenLab/CLASTER/master/images/Claster_image.png)

**CLASTER overview:** CLASTER integrates the chromatin landscape (accessibility, promoter and enhancer activities and chromatin silencing) and structure (Micro-C) to predict nascent transcription levels measured by EU-seq.
**CLASTER overview** CLASTER integrates the chromatin landscape (accessibility, promoter and enhancer activities and chromatin silencing) and structure (Micro-C) to predict nascent transcription levels measured by EU-seq.

## In this repository

This repository contains the files and scripts required to reproduce the results of the paper and a short tutorial. The repository consists of the following folders:

```configurations```:
### `configurations`
- Configuration files (.yaml) required to build different flavours of CLASTER.

```images```:
### `images`
- Overview of CLASTER's architecture.

```inputs```:
### `inputs`

The folder contains the test set inputs for both data modalities, i.e. samples exploring regions of 1 Mbp centered at the TSS of protein coding genes found in chr4 (in mice). They will be used in the tutorial to exemplify how can we train and validate CLASTER.

```scripts```:
### `scripts`

- [`0_Tutorial.ipynb`](scripts/0_Tutorial.ipynb): The notebook provides a rapid overview of the most important steps in CLASTER's pipeline, including training and validating the network using the EIR framework.
- [`0_Tutorial.ipynb`](https://github.com/RasmussenLab/CLASTER/blob/master/scripts/0_Tutorial.ipynb): The notebook provides a rapid overview of the most important steps in CLASTER's pipeline, including training and validating the network using the EIR framework.
- `1_Data_obtention.ipynb`: This notebook guides the user through the data obtention process, including:
- Data download from publicly available repositories:
- Inputs: Chromatin landscape (ATAC-seq, H3K4me3, H3K27ac and H3K27me3 in mESCs) and structure (Micro-C maps in mESCs)
Expand All @@ -45,9 +45,6 @@ These were used to benchmark CLASTER. It includes:
- Code to fine-tune Hyena-DNA's backbone and the added head together.
- `3_Data_analysis.ipynb`: The notebook contains the functions used to perform the data analysis and create the figures included in the manuscript.

```targets```:
### `targets`

The folder contains the target EU-seq profiles matching the input (test) samples.



99 changes: 99 additions & 0 deletions conf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
# Configuration file for the Sphinx documentation builder.
#
# This file only contains a selection of the most common options. For a full
# list see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html

# -- Path setup --------------------------------------------------------------

# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.

# -- Project information -----------------------------------------------------

project = 'CLASTER'
copyright = '2022, Marc Pielies Avelli'
author = 'Marc Pielies Avelli'
version = '2024.05.29'
release = '2024.05.29'

# -- General configuration ---------------------------------------------------

# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
'sphinx.ext.autodoc',
'sphinx.ext.autodoc.typehints',
'sphinx.ext.viewcode',
'sphinx.ext.intersphinx',
'myst_nb',
'sphinx.ext.napoleon',
# 'sphinx_new_tab_link',
]

# https://myst-nb.readthedocs.io/en/latest/computation/execute.html
nb_execution_mode = "off"

myst_enable_extensions = ["dollarmath", "amsmath"]

# Plolty support through require javascript library
# https://myst-nb.readthedocs.io/en/latest/render/interactive.html#plotly
# html_js_files = ["https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.4/require.min.js"]

# https://myst-nb.readthedocs.io/en/latest/configuration.html
# Execution
nb_execution_raise_on_error = True
# Rendering
nb_merge_streams = True

# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']

# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = ['_build', 'Thumbs.db', 'jupyter_execute','.DS_Store']


# Intersphinx options
intersphinx_mapping = {
"python": ("https://docs.python.org/3", None),
"pandas": ("https://pandas.pydata.org/pandas-docs/stable/", None),
"scikit-learn": ("https://scikit-learn.org/stable/", None),
"matplotlib": ("https://matplotlib.org/stable/", None),
}

# -- Options for HTML output -------------------------------------------------

# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
# See:
# https://github.com/executablebooks/MyST-NB/blob/master/docs/conf.py
html_title = "CLASTER"
html_theme = "sphinx_book_theme"
# html_logo = "_static/logo-wide.svg"
# html_favicon = "_static/logo-square.svg"
html_theme_options = {
"github_url": "https://github.com/RasmussenLab/CLASTER",
"repository_url": "https://github.com/RasmussenLab/CLASTER",
"repository_branch": "master",
"home_page_in_toc": True,
# "path_to_docs": "",
"show_navbar_depth": 1,
"use_edit_page_button": True,
"use_repository_button": True,
"use_download_button": True,
"launch_buttons": {
"colab_url": "https://colab.research.google.com"
# "binderhub_url": "https://mybinder.org",
# "notebook_interface": "jupyterlab",
},
"navigation_with_keys": False,
}

# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
# html_static_path = ["_static"]
23 changes: 23 additions & 0 deletions index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
Claster
=======
Modeling nascent RNA transcription from chromatin landscape and structure


.. include:: Readme.md
:parser: myst_parser.sphinx_
:start-line: 3

.. toctree::
:maxdepth: 2
:caption: Tutorial:

scripts/0_Tutorial.ipynb

.. toctree::
:maxdepth: 2
:caption: Scripts:

scripts/1_Data_obtention.ipynb
scripts/2_Run_CLASTER.ipynb
scripts/2b_Run_HyenaDNA_and_Enformer.ipynb
scripts/3_Data_analysis.ipynb
Loading