Skip to content

Commit

Permalink
Resolves bugs in writing cif file and parity similarity calculation (#28
Browse files Browse the repository at this point in the history
)

* corrected the reversal of bondtype from zero to dative

* use mol from sanitized result

* enable parsing a subsets of ccds from components cif file

* fix: fix error in writing cif file

This fixes the error in generating inchikey while inchi is missing

* style: formating

* chore: removed unnecessary None

* doc: add download badge and CLC feature

* chore: add citation file

* fix: write rdkit properties if it is present only

* fix: correct use of length of list in python

* fix: change bondtype from dative to zero for parity method

SMARTS with dative bondtype fails to find substructures, hence dative bond types are changed to zero

* test: add HEM for parity test

* chore: linting and formatting

* test: removed parity test using HEM

to check if the segmentation fault in githubworkflow is due to this

* fix: remove PDBe from unichem mapping sources

* chore: Update dependencies and package management

Update the project's dependencies and package management to use Poetry instead of pip. This includes installing Poetry and running `poetry install --with tests` to install the project dependencies. Remove the installation of `rdkit==2023.09.6` and `pre-commit`.

* chore: Update pre-commit hooks and dependencies

Update the pre-commit hooks to use the Ruff pre-commit hook repository and remove the black and flake8 hooks. Also, add the rST Formatter hook from the rstfmt repository.

* chore: Update tests.yml to use Poetry for pre-commit and pytest commands

Update the tests.yml file to use Poetry for the pre-commit and pytest commands. This ensures that the project's dependencies are installed correctly and that the pre-commit hooks are run using Poetry. The changes include replacing "pre-commit install" with "poetry run pre-commit install" and "pytest --cov=pdbeccdutils" with "poetry run pytest --cov=pdbeccdutils".

* Minor formatting

* chore: Install pre-commit package in tests.yml

Add the installation of the pre-commit package in the tests.yml file to ensure that pre-commit hooks are run during the testing process. This will help catch and fix any code style or formatting issues before committing the changes.

* chore: Update docs and publish pipelines to use poetry

* test: add HEM to parity test

* bump up version number

* 🔥 remove __init__ file

* 🔧 update documentation action to use poetry

* 🔧 update publish action to use poetry

* 🔧 update test action to use poetry

* 🎨 move details from setup to pyproject.toml file

* ♻️ use version information from pyproject.toml

* 🩹 add CCDC to unichem resources

* ✏️ fix typos

* ✏️ fix typos

* ♻️ refactor configs

* 🎨 use single function to get properties of rdkit objects

* 🎨 use rdkit_object_property function insted of get_componet_atom_id

* ✨ get name of clc from entities

* 🩹 import importlib.metadata

* 🎨 linting and formatting

* 🩹 removed "data" from the path

* 🔧 add poetry.toml file

* 🔧 update poetry.lock file

* 🔧 remove pre-commit from test workflow

* 🎨 linting and formatting

* 📝 replace github downloads with pypi downloads

* bump up version

* 📝 update changelog for release 0.8.6

* 🔧 create hook to generate poetry.lock file

* 📝 update readme with installation using poetry

* 🔧 update the rdkit version number

* 🔧 update poetry lock file

---------

Co-authored-by: roshan <[email protected]>
Co-authored-by: Sreenath Sasidharan Nair <[email protected]>
  • Loading branch information
3 people authored Oct 30, 2024
1 parent dbe3b87 commit 618f680
Show file tree
Hide file tree
Showing 39 changed files with 1,626 additions and 235 deletions.
14 changes: 10 additions & 4 deletions .github/workflows/documentation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,12 +20,18 @@ jobs:
uses: actions/setup-python@v4
with:
python-version: "3.10"
- run: |
pip install rdkit
pip install -e ".[docs]"
- name: Install poetry
uses: abatilo/actions-poetry@v2
- name: Define a cache for the virtual environment based on the dependencies lock file
uses: actions/cache@v3
with:
path: ./.venv
key: venv-${{ hashFiles('poetry.lock') }}
- name: Install the package with doc dependencies
run: poetry install --with docs
- run: |
cd doc
make html
poetry run sphinx-build -b html . _build/html
- name: Deploy pages
uses: peaceiris/actions-gh-pages@v3
with:
Expand Down
17 changes: 11 additions & 6 deletions .github/workflows/publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,14 +25,19 @@ jobs:
uses: actions/setup-python@v3
with:
python-version: "3.10"
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install build
- name: Install poetry
uses: abatilo/actions-poetry@v2
- name: Define a cache for the virtual environment based on the dependencies lock file
uses: actions/cache@v3
with:
path: ./.venv
key: venv-${{ hashFiles('poetry.lock') }}
- name: Install the package with doc dependencies
run: poetry install --without docs,tests
- name: Build package
run: python -m build
run: poetry build
- name: Publish package
uses: pypa/gh-action-pypi-publish@27b31702a0e7fc50959f5ad993c78deac1bdfc29
uses: pypa/gh-action-pypi-publish@release/v1
with:
user: ${{ secrets.PYPI_USERNAME }}
password: ${{ secrets.PYPI_PASSWORD }}
15 changes: 9 additions & 6 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,13 @@ jobs:
uses: actions/setup-python@v4
with:
python-version: "3.10"

- name: Install poetry
uses: abatilo/actions-poetry@v2
- name: Define a cache for the virtual environment based on the dependencies lock file
uses: actions/cache@v3
with:
path: ./.venv
key: venv-${{ hashFiles('poetry.lock') }}
- run: |
pip install rdkit==2023.09.6
pip install -e ".[tests]"
pip install pre-commit
pre-commit install && pre-commit run --all
- run: pytest --cov=pdbeccdutils
poetry install --with tests
poetry run pytest --cov=pdbeccdutils
28 changes: 19 additions & 9 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,15 +13,25 @@ repos:
- id: fix-byte-order-marker
- id: end-of-file-fixer
- id: check-ast
- id: no-commit-to-branch

- repo: https://github.com/ambv/black
rev: 22.3.0
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.3.5 # Ruff version.
hooks:
- id: black

- repo: https://github.com/pycqa/flake8
rev: 3.9.1
- id: ruff
args: [--fix]
- id: ruff-format
- repo: https://github.com/dzhu/rstfmt
rev: v0.0.14
hooks:
- id: flake8
args: ["--max-line-length=88", "--ignore=E501,W503"]
exclude: \.cif$
- id: rstfmt
name: rST Formatter
- repo: https://github.com/python-poetry/poetry
rev: "1.8.2"
hooks:
- id: poetry-check
- id: poetry-lock
args:
- "--no-update"
- "--check"

7 changes: 7 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,12 @@
# Changelog

## RELEASE 0.8.6 - Oct 28, 2024

### Features
* Enable parsing of a subset of CCDs from the Chemical Component Dictionary
* Added CCDC to UniChem resources


## RELEASE 0.8.5 - May 26, 2024

### Features
Expand Down
53 changes: 53 additions & 0 deletions CITATION.cff
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Kunnakkattu"
given-names: "Ibrahim Roshan"
orcid: "https://orcid.org/0000-0002-8646-0969"
- family-names: "Pravda"
given-names: "Lukas"
- family-names: "Yuan"
given-names: "Qi"
- family-names: "S.Smart"
given-names: "Oliver"
- family-names: "Nadzirin"
given-names: "Nurul"
- family-names: "Anyango"
given-names: "Stephen"
- family-names: "Nair"
given-names: "Sreenath"

title: "PDBe CCDUtils"
version: 0.8.5
date-released: 22/05/2024
url: "https://github.com/PDBeurope/ccdutils"
preferred-citation:
type: article
authors:
- family-names: "Kunnakkattu"
given-names: "Ibrahim Roshan"
orcid: "https://orcid.org/0000-0002-8646-0969"
- family-names: "Choudhary"
given-names: "Preeti"
orcid: "https://orcid.org/0000-0003-2340-3278"
- family-names: "Pravda"
given-names: "Lukas"
- family-names: "Yuan"
given-names: "Qi"
- family-names: "S.Smart"
given-names: "Oliver"
- family-names: "Nadzirin"
given-names: "Nurul"
- family-names: "Anyango"
given-names: "Stephen"
- family-names: "Nair"
given-names: "Sreenath"
- family-names: "Velankar"
given-names: "Sameer"
orcid: "https://orcid.org/0000-0002-8439-5964"
doi: "10.1186/s13321-023-00786-w"
journal: "Journal of Cheminformatics"
month: 12
title: "PDBe CCDUtils: an RDKit-based toolkit for handling and analysing small molecules in the Protein Data Bank"
volume: 15
year: 2023
101 changes: 54 additions & 47 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,76 +1,83 @@
[![CodeFactor](https://www.codefactor.io/repository/github/pdbeurope/ccdutils/badge/master)](https://www.codefactor.io/repository/github/pdbeurope/ccdutils/overview/master) ![PYPi](https://img.shields.io/pypi/v/pdbeccdutils?color=green&style=flat) ![GitHub](https://img.shields.io/github/license/pdbeurope/ccdutils) ![ccdutils documentation](https://github.com/PDBeurope/ccdutils/workflows/ccdutils%20documentation/badge.svg) ![ccdutils tests](https://github.com/PDBeurope/ccdutils/workflows/ccdutils%20tests/badge.svg)
[![CodeFactor](https://www.codefactor.io/repository/github/PDBeurope/ccdutils/badge/master)](https://www.codefactor.io/repository/github/PDBeurope/ccdutils/overview/master) ![PYPi](https://img.shields.io/pypi/v/pdbeccdutils?color=green&style=flat) ![GitHub](https://img.shields.io/github/license/PDBeurope/ccdutils) ![ccdutils documentation](https://github.com/PDBeurope/ccdutils/workflows/ccdutils%20documentation/badge.svg) ![ccdutils tests](https://github.com/PDBeurope/ccdutils/workflows/ccdutils%20tests/badge.svg) ![PyPI Downloads](https://img.shields.io/pypi/dm/pdbeccdutils)


# pdbeccdutils

* A set of python tools to deal with PDB chemical components definitions
for small molecules, taken from the [wwPDB Chemical Component Dictionary](https://www.wwpdb.org/data/ccd) and [wwPDB The Biologically Interesting Molecule Reference Dictionary](https://www.wwpdb.org/data/bird)
An RDKit-based python toolkit for parsing and processing small molecule definitions in [wwPDB Chemical Component Dictionary](https://www.wwpdb.org/data/ccd) and [wwPDB The Biologically Interesting Molecule Reference Dictionary](https://www.wwpdb.org/data/bird).`pdbeccdutils` provides streamlined access to all metadata of small molecules in the PDB and offers a set of convenient methods to compute various properties of small molecules using RDKIt such as 2D depictions, 3D conformers, physicochemical properties, matching common fragments and scaffolds, mapping to small-molecule databases using UniChem.

## Features

* The tools use:
* [RDKit](http://www.rdkit.org/) for chemistry. Presently tested with `2022.09.4`
* `gemmi` CCD read/write.
* Generation of 2D depictions (`No image available` generated if the flattening cannot be done) along with the quality check.
* Generation of 3D conformations.
* Fragment library search (PDBe hand-curated library, ENAMINE, DSI).
* Chemical scaffolds (Murcko scaffold, Murcko general, BRICS).
* Lightweight implementation of [parity method](https://doi.org/10.1016/j.str.2018.02.009) by Jon Tyzack.
* RDKit molecular properties per component.
* UniChem mapping.
* Generating complete representation of multiple [Covalently Linked Components (CLC)](https://www.ebi.ac.uk/pdbe/news/introducing-covalently-linked-components)

## Dependencies

* [RDKit](http://www.rdkit.org/) for small molecule representation. Presently tested with `2023.9.6`
* [GEMMI](https://gemmi.readthedocs.io/en/latest/index.html) for parsing mmCIF files.
* [scipy](https://www.scipy.org/) for depiction quality check.
* [numpy](https://www.numpy.org/) for molecular scaling.
* [networkx](https://networkx.org/) for bound-molecules.

* Please note that the project is under active development.

## Installation instructions
## Installation

* `pdbeccdutils` requires RDKit to be installed.
The official RDKit documentation has [installation instructions for a variety of platforms](http://www.rdkit.org/docs/Install.html).
For Linux/macOS this is most easily done using the Anaconda Python with commands similar to:
create a [virtual environment](https://packaging.python.org/en/latest/guides/installing-using-pip-and-virtual-environments/#create-and-use-virtual-environments) and install using pip

```console
conda create -n rdkit-env rdkit python=3.9
conda activate rdkit-env
```

* Once you have installed RDKit, as described above then install `pdbeccdutils` using `pip`:

```console
```bash
pip install pdbeccdutils
```

## Features
## Contribution
We encourage you to contribute to this project. The package uses [poetry](https://python-poetry.org/) for packaging and dependency management. You can develop locally using:

* `gemmi` CCD read/write.
* Generation of 2D depictions (`No image available` generated if the flattening cannot be done) along with the quality check.
* Generation of 3D conformations.
* Fragment library search (PDBe hand-curated library, ENAMINE, DSI).
* Chemical scaffolds (Murcko scaffold, Murcko general, BRICS).
* Lightweight implementation of [parity method](https://doi.org/10.1016/j.str.2018.02.009) by Jon Tyzack.
* RDKit molecular properties per component.
* UniChem mapping.
```bash
git clone https://github.com/PDBeurope/ccdutils.git
cd ccdutils
pip install poetry
poetry install --with tests,docs
pre-commit install
```

## TODO list
The pre-commit hook will run linting, formatting and update `poetry.lock`. The `poetry.lock` file will lock all dependencies and ensure that they match pyproject.toml versions.

* Add more unit/regression tests to get higher code coverage.
* Further improvements of the documentation.
To add a new dependency

```bash
# Latest resolvable version
poetry add <package>

## Documentation
# Optionally fix a version
poetry add <package>@<version>
```

To change a version of a dependency, either edit pyproject.toml and run:

The documentation depends on the following packages:
```bash
poetry sync --with dev
```

* `sphinx`
* `sphinx_rtd_theme`
* `myst-parser`
* `sphinx-autodoc-typehints`
or

Note that `sphinx` needs to be a part of the virtual environment, if you want to generate documentation by yourself.
Otherwise it cannot pick `rdkit` module. `sphinx_rtd_theme` is a theme providing nice `ReadtheDocs` mobile friendly style.
```bash
poetry add <package>@<version>
```

* Generate *.rst* files to be included as a part of the documentation. Inside the directory `pdbeccdutils/doc` run the following commands to generate documentation.
* Alternatively, use the `myst-parser` package to get the Markdown working.

Use the following to generate initial markup files to be used by sphinx. This needs to be used when adding another sub-packages.
## Documentation

```console
sphinx-apidoc -f -o /path/to/output/dir ../pdbeccdutils/
```
The documentation is generated using `sphinx` in `sphinx_rtd_theme` and hosted on GitHub Pages. To generate the documentation locally,

Use this to re-generate the documentation from the doc/ directory:
```bash
cd doc
poetry run sphinx-build -b html . _build/html

```console
make html
# See the documentation at http://localhost:8080.
python -m http.server 8080 -d _build/html
```
6 changes: 3 additions & 3 deletions doc/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,17 +13,17 @@
# documentation root, use os.path.abspath to make it absolute, like shown here.
#

import pdbeccdutils
import importlib.metadata

# region Project information
project = "pdbeccdutils"
copyright = "2020, Protein Data Bank in Europe"
author = "Protein Data Bank in Europe"

# The short X.Y version
version = pdbeccdutils.__version__
version = importlib.metadata.version("pdbeccdutils")
# The full version, including alpha/beta/rc tags
release = pdbeccdutils.__version__
release = importlib.metadata.version("pdbeccdutils")

# endregion

Expand Down
1 change: 0 additions & 1 deletion pdbeccdutils/__init__.py

This file was deleted.

19 changes: 15 additions & 4 deletions pdbeccdutils/computations/parity_method.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@
from rdkit import Chem
from rdkit.Chem import rdFMCS

from pdbeccdutils.helpers import mol_tools
from rdkit.Chem import BondType
from pdbeccdutils.core.models import ParityResult


Expand Down Expand Up @@ -88,8 +90,17 @@ def compare_molecules(template, query, thresh=0.01, exact_match=False):
ParityResult: Result of the PARITY comparison.
"""

template_atoms = template.GetNumAtoms()
query_atoms = query.GetNumAtoms()
template_copy = Chem.RWMol(template)
query_copy = Chem.RWMol(query)

# changing bondtype from DATIVE to ZERO as the SMARTS with DATIVE bondtype were missing
# substructures using GetSubstructMatches (e.g. HEM)
# refer rdkit github issue https://github.com/rdkit/rdkit/issues/7280
mol_tools.change_bonds_type(template_copy, BondType.DATIVE, BondType.ZERO)
mol_tools.change_bonds_type(query_copy, BondType.DATIVE, BondType.ZERO)

template_atoms = template_copy.GetNumAtoms()
query_atoms = query_copy.GetNumAtoms()

min_num_atoms = min(template_atoms, query_atoms)
max_sim_score = float(min_num_atoms) / float(
Expand All @@ -101,15 +112,15 @@ def compare_molecules(template, query, thresh=0.01, exact_match=False):

if not exact_match:
mcs_graph = rdFMCS.FindMCS(
[template, query],
[template_copy, query_copy],
bondCompare=rdFMCS.BondCompare.CompareAny,
atomCompare=rdFMCS.AtomCompare.CompareAny,
timeout=40,
completeRingsOnly=True,
)
else:
mcs_graph = rdFMCS.FindMCS(
[template, query],
[template_copy, query_copy],
bondCompare=rdFMCS.BondCompare.CompareOrderExact,
atomCompare=rdFMCS.AtomCompare.CompareElements,
timeout=40,
Expand Down
2 changes: 1 addition & 1 deletion pdbeccdutils/core/boundmolecule.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ def infer_bound_molecules(structure, to_discard, assembly=False):
bm = BoundMolecule(subgraph)
bound_molecules.append(bm)

bound_molecules = sorted(bound_molecules, key=lambda l: -len(l.graph.nodes))
bound_molecules = sorted(bound_molecules, key=lambda item: -len(item.graph.nodes))
return bound_molecules


Expand Down
Loading

0 comments on commit 618f680

Please sign in to comment.