Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OBBioMatchError #4

Open
kuraisle opened this issue May 11, 2021 · 5 comments
Open

OBBioMatchError #4

kuraisle opened this issue May 11, 2021 · 5 comments

Comments

@kuraisle
Copy link

Hi,

I'm trying to run a structure through arpeggio. It works fine on the web server. I've used the "clean_pdb.py" script from PDBTools on it and I still get the OBBioMatchError. None of my atoms have alternative locations and all of my atom serial numbers are unique. If there's anything else I could try, that would be great!

Thanks,
James

@hippolytej
Copy link

hippolytej commented May 11, 2021

Hey,

I'm having the same issue here, with a structure that works just fine on the web server, and fails with this package.

I am using OpenMM's PDBFixer to clean an input file download from RCSB, attached here: clean_5lxr.pdb.zip

Steps to reproduce:

First download and unzip input file clean_5lxr.pdb.zip, then:

conda create -n arpeggio-env python=3.7
conda activate arpeggio-env
conda install -c openbabel openbabel
pip install git+https://github.com/PDBeurope/arpeggio.git@master#egg=arpeggio
arpeggio clean_5lxr.pdb

Full Traceback

INFO//16:44:50.496//Program begin.
WARNING//16:44:50.496//No selection was perceived. Defaults into full structure!!
/Users/hippolyte/opt/miniconda3/envs/arpeggio-env/lib/python3.7/site-packages/Bio/PDB/StructureBuilder.py:92: PDBConstructionWarning: WARNING: Chain A is discontinuous at line 3359.
  PDBConstructionWarning,
/Users/hippolyte/opt/miniconda3/envs/arpeggio-env/lib/python3.7/site-packages/Bio/PDB/StructureBuilder.py:92: PDBConstructionWarning: WARNING: Chain B is discontinuous at line 3367.
  PDBConstructionWarning,
/Users/hippolyte/opt/miniconda3/envs/arpeggio-env/lib/python3.7/site-packages/Bio/PDB/StructureBuilder.py:92: PDBConstructionWarning: WARNING: Chain A is discontinuous at line 3368.
  PDBConstructionWarning,
/Users/hippolyte/opt/miniconda3/envs/arpeggio-env/lib/python3.7/site-packages/Bio/PDB/StructureBuilder.py:92: PDBConstructionWarning: WARNING: Chain B is discontinuous at line 3411.
  PDBConstructionWarning,
DEBUG//16:44:50.536//Loaded PDB structure (BioPython)
DEBUG//16:44:51.450//Loaded PDB structure (OpenBabel)
ERROR//16:44:51.451//An OpenBabel atom could not be matched to a BioPython counterpart.
Traceback (most recent call last):
  File "/Users/hippolyte/opt/miniconda3/envs/arpeggio-env/lib/python3.7/site-packages/arpeggio/core/interactions.py", line 1995, in _establish_structure_mappping
    biopython_atom = serial_to_bio[serial]
KeyError: 0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/hippolyte/opt/miniconda3/envs/arpeggio-env/bin/arpeggio", line 8, in <module>
    sys.exit(main())
  File "/Users/hippolyte/opt/miniconda3/envs/arpeggio-env/lib/python3.7/site-packages/arpeggio/scripts/process_protein_cli.py", line 79, in main
    run_arpeggio(args)
  File "/Users/hippolyte/opt/miniconda3/envs/arpeggio-env/lib/python3.7/site-packages/arpeggio/scripts/process_protein_cli.py", line 89, in run_arpeggio
    i_complex = InteractionComplex(args.filename, args.vdw_comp, args.interacting, args.ph)
  File "/Users/hippolyte/opt/miniconda3/envs/arpeggio-env/lib/python3.7/site-packages/arpeggio/core/interactions.py", line 92, in __init__
    self._establish_structure_mappping()
  File "/Users/hippolyte/opt/miniconda3/envs/arpeggio-env/lib/python3.7/site-packages/arpeggio/core/interactions.py", line 1998, in _establish_structure_mappping
    raise OBBioMatchError(serial)
arpeggio.core.exceptions.OBBioMatchError: 0

Thanks for your help,

Hippolyte

@kuraisle
Copy link
Author

Hey,

I'm having the same issue here, with a structure that works just fine on the web server, and fails with this package.

I am using OpenMM's PDBFixer to clean an input file download from RCSB, attached here: clean_5lxr.pdb.zip

Steps to reproduce:

First download and unzip input file clean_5lxr.pdb.zip, then:

conda create -n arpeggio-env python=3.7
conda activate arpeggio-env
conda install -c openbabel openbabel
pip install git+https://github.com/PDBeurope/arpeggio.git@master#egg=arpeggio
arpeggio clean_5lxr.pdb

Full Traceback

INFO//16:44:50.496//Program begin.
WARNING//16:44:50.496//No selection was perceived. Defaults into full structure!!
/Users/hippolyte/opt/miniconda3/envs/arpeggio-env/lib/python3.7/site-packages/Bio/PDB/StructureBuilder.py:92: PDBConstructionWarning: WARNING: Chain A is discontinuous at line 3359.
  PDBConstructionWarning,
/Users/hippolyte/opt/miniconda3/envs/arpeggio-env/lib/python3.7/site-packages/Bio/PDB/StructureBuilder.py:92: PDBConstructionWarning: WARNING: Chain B is discontinuous at line 3367.
  PDBConstructionWarning,
/Users/hippolyte/opt/miniconda3/envs/arpeggio-env/lib/python3.7/site-packages/Bio/PDB/StructureBuilder.py:92: PDBConstructionWarning: WARNING: Chain A is discontinuous at line 3368.
  PDBConstructionWarning,
/Users/hippolyte/opt/miniconda3/envs/arpeggio-env/lib/python3.7/site-packages/Bio/PDB/StructureBuilder.py:92: PDBConstructionWarning: WARNING: Chain B is discontinuous at line 3411.
  PDBConstructionWarning,
DEBUG//16:44:50.536//Loaded PDB structure (BioPython)
DEBUG//16:44:51.450//Loaded PDB structure (OpenBabel)
ERROR//16:44:51.451//An OpenBabel atom could not be matched to a BioPython counterpart.
Traceback (most recent call last):
  File "/Users/hippolyte/opt/miniconda3/envs/arpeggio-env/lib/python3.7/site-packages/arpeggio/core/interactions.py", line 1995, in _establish_structure_mappping
    biopython_atom = serial_to_bio[serial]
KeyError: 0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/hippolyte/opt/miniconda3/envs/arpeggio-env/bin/arpeggio", line 8, in <module>
    sys.exit(main())
  File "/Users/hippolyte/opt/miniconda3/envs/arpeggio-env/lib/python3.7/site-packages/arpeggio/scripts/process_protein_cli.py", line 79, in main
    run_arpeggio(args)
  File "/Users/hippolyte/opt/miniconda3/envs/arpeggio-env/lib/python3.7/site-packages/arpeggio/scripts/process_protein_cli.py", line 89, in run_arpeggio
    i_complex = InteractionComplex(args.filename, args.vdw_comp, args.interacting, args.ph)
  File "/Users/hippolyte/opt/miniconda3/envs/arpeggio-env/lib/python3.7/site-packages/arpeggio/core/interactions.py", line 92, in __init__
    self._establish_structure_mappping()
  File "/Users/hippolyte/opt/miniconda3/envs/arpeggio-env/lib/python3.7/site-packages/arpeggio/core/interactions.py", line 1998, in _establish_structure_mappping
    raise OBBioMatchError(serial)
arpeggio.core.exceptions.OBBioMatchError: 0

Thanks for your help,

Hippolyte

Hi Hippolyte,

Strangely I have since run the python 2.7 version of this and it seems to work (with some warnings). I still have to clean up my structures before using them, but I get a result!

@hippolytej
Copy link

Hi @kuraisle thanks for sharing, I've been as well using the py2.7 version, which works just fine with the structures that I have :)
Still, a python 3 version would be much more practical...

@reneeotten
Copy link

This is indeed a problem for ever PDB file (downloading the same CIF file from RCSB works). The issue is that the the Id for first residue yields 1 for an MMCIF file, whereas it gives 0 for a PDB file.

import os

import openbabel as ob

from arpeggio.core import protein_reader


PDBid = '2ltq'
basename = os.path.join('arpeggio/tests/test_data/structures', f'{PDBid}')

ob_conv = ob.OBConversion()
ob_conv.SetInFormat('pdb')
mol = ob.OBMol()
ob_conv.ReadFile(mol, f'{basename}.pdb')
pdb_atom_iter = ob.OBMolAtomIter(mol)

mmcif = protein_reader.read_mmcif_to_openbabel(f'{basename}.cif')
mmcif_atom_iter = ob.OBMolAtomIter(mmcif)

pdb_atom = pdb_atom_iter.next()
mmcif_atom = mmcif_atom_iter.next()

print('\n*** ID for the first atom when reading in the PDB or CIF file for '
      f'the same structure ({PDBid}) ***')
print(f'pdb_atom.GetId = {pdb_atom.GetId()}')
print(f'mmcif_atom.GetId = {mmcif_atom.GetId()}')

results in:

*** ID for the first atom when reading in the PDB or CIF file for the same structure (2ltq) ***
pdb_atom.GetId = 0
mmcif_atom.GetId = 1

It seems to me that for the PDB file it is using "standard" OpenBabel functions for reading in the file, whereas an mmCIF file uses in-house written code. I haven't gone through the code in detail where the difference originates from, but this definitely needs to be fixed as running a PDB or mmCIF file should give the same result.

@skelm
Copy link

skelm commented Jun 19, 2021

Hi all,

I have a somewhat dirty fix. My _establish_structure_mappping() method in arpeggio/core/interactions.py (starting at line 1981) now looks like this:

def _establish_structure_mappping(self):
    """Maps biopython atoms to openbabel ones and vice-versa.

    Raises:
        OBBioMatchError: If we cant match an OB atom to a biopython
    """
    # FIRST MAP PDB SERIAL NUMBERS TO BIOPYTHON ATOMS FOR SPEED LATER
    # THIS AVOIDS LOOPING THROUGH `s_atoms` MANY TIMES
    serial_to_bio = {x.serial_number: x for x in self.s_atoms}

    # `Id` IS A UNIQUE AND STABLE ID IN OPENBABEL
    # CAN RECOVER THE ATOM WITH `mol.GetAtomById(id)`
    serial_to_ob = {x.GetId(): x for x in ob.OBMolAtomIter(self.ob_mol)}

    if 0 in serial_to_ob and 0 not in serial_to_bio:
        offset = 1
        logging.debug('OB atom serial numbers start with 0, but there are no BioPython atoms with serial number 0. Adding 1 to all OB serials for mapping purposes.')
    else:
        offset = 0

    for serial, ob_atom in serial_to_ob.items():
        # MATCH TO THE BIOPYTHON ATOM BY SERIAL NUMBER
        try:
            biopython_atom = serial_to_bio[serial + offset]
        except KeyError:
            raise OBBioMatchError("Failed to match OB atom to a BioPython atom: {},{} : {}({})".format(serial, serial + offset, ob_atom.GetType()[:1], ob_atom.GetType()))
        
        # Sanity check that elements match, for debugging
        if ob_atom.GetType()[:1] != biopython_atom.element[:1]:
            raise OBBioMatchError("Failed to correctly match atoms with different elements: {},{} : {}({}) != {}({}, {})".format(serial, serial + offset, ob_atom.GetType()[:1], ob_atom.GetType(), biopython_atom.element[:1], biopython_atom.element, biopython_atom.name))

        self.ob_to_bio[ob_atom.GetId()] = biopython_atom
        self.bio_to_ob[biopython_atom] = ob_atom.GetId()

    logging.debug('Mapped OB to BioPython atoms and vice-versa.')

I'm happy to let the author fix it in a cleaner way, or let someone else do a proper pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants