This repository contains the scripts (in jupyter notebooks) to generate the figure in the manuscript "BGCFlow: Systematic pangenome workflow for the analysis of biosynthetic gene clusters across large genomic datasets".
git clone https://github.com/matinnuhamunada/saccharopolyspora_manuscript.git
# create and activate new conda environment
conda create -n bgcflow pip -y
conda activate bgcflow
# install BGCFlow wrapper
pip install git+https://github.com/NBChub/bgcflow_wrapper.git
# clone BGCFlow to "bgcflow" folder
bgcflow clone bgcflow
- Donwload the dataset containing the BGCFlow runs from Zenodo
# move to bgcflow dir
cd bgcflow
# download and extract dataset
wget https://zenodo.org/record/8018055/files/saccharopolyspora_dataset.zip
unzip saccharopolyspora_dataset.zip
# go back to the manuscript dir
cd ../saccharopolyspora_manuscript/
# edit the location of the bgcflow dir to the right directory
nano config.yaml
Install these conda environments:
mamba env create -f python_notebook.yaml
mamba env create -f r_notebook.yaml
mamba env create -f <bgcflow_dir>/workflow/envs/cblaster.yaml
- There are two kind of notebooks, R (.R.ipynb) and python (.python.ipynb)
- Run the notebook using the corresponding conda environment:
python_notebook
orr_notebook
- Start jupyter session
# for python
conda activate python_notebook
jupyter lab
# for R
conda activate r_notebook
jupyter lab
- Run the notebooks in order
Matin Nuhamunada, Omkar S. Mohite, Patrick V. Phaneuf, Bernhard O. Palsson, and Tilmann Weber. (2023). BGCFlow: Systematic pangenome workflow for the analysis of biosynthetic gene clusters across large genomic datasets. bioRxiv 2023.06.14.545018; doi: https://doi.org/10.1101/2023.06.14.545018
Nuhamunada, Matin, & Mohite, Omkar Satyavan. (2023). BGCFlow Analysis of Saccharopolyspora Genomes (0.1.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.8018055