How to get original SEQRES? #1949

lokapal · 2024-09-09T16:08:03Z

I need in SEQRES in the original non-translated to single-letter code, because I need to find non-standard aminoacids that are presenting in some proteins (more frequently they are hydroxyproline and hydroxylysine). I can detect them by scanning .resname from chemical and search for them into original SEQRES. Right now it doesn't work obviously.

from prody import *

chems = parsePDBHeader('1mkp', 'chemicals')
for chem in chems:
    print(chem.resname)

polys = parsePDBHeader('1mkp', 'polymers')
for prot in polys:
    SEQRES=prot.sequence
    print (SEQRES)

I'd like to get in SEQRES the list of strings or at least the one big string with three-letters codes and spaces. Are there means to do that?
Moreover, your forced conversion makes errors:
1BKV:

SEQRES   1 A   30  PRO HYP GLY PRO HYP GLY PRO HYP GLY ILE THR GLY ALA
SEQRES   2 A   30  ARG GLY LEU ALA GLY PRO HYP GLY PRO HYP GLY PRO HYP
SEQRES   3 A   30  GLY PRO HYP GLY
SEQRES   1 B   30  PRO HYP GLY PRO HYP GLY PRO HYP GLY ILE THR GLY ALA
SEQRES   2 B   30  ARG GLY LEU ALA GLY PRO HYP GLY PRO HYP GLY PRO HYP
SEQRES   3 B   30  GLY PRO HYP GLY
SEQRES   1 C   30  PRO HYP GLY PRO HYP GLY PRO HYP GLY ILE THR GLY ALA
SEQRES   2 C   30  ARG GLY LEU ALA GLY PRO HYP GLY PRO HYP GLY PRO HYP

prot.sequence:

PPGPPGPPGITGARGLAGPPGPPGPPGPPG

We know from MODRES, that oxyprolines are at the following places:

MODRES 1BKV HYP A    2  PRO  4-HYDROXYPROLINE
MODRES 1BKV HYP A    5  PRO  4-HYDROXYPROLINE
MODRES 1BKV HYP A    8  PRO  4-HYDROXYPROLINE
MODRES 1BKV HYP A   20  PRO  4-HYDROXYPROLINE
MODRES 1BKV HYP A   23  PRO  4-HYDROXYPROLINE
MODRES 1BKV HYP A   26  PRO  4-HYDROXYPROLINE
MODRES 1BKV HYP A   29  PRO  4-HYDROXYPROLINE
MODRES 1BKV HYP B   32  PRO  4-HYDROXYPROLINE
MODRES 1BKV HYP B   35  PRO  4-HYDROXYPROLINE
MODRES 1BKV HYP B   38  PRO  4-HYDROXYPROLINE
MODRES 1BKV HYP B   50  PRO  4-HYDROXYPROLINE
MODRES 1BKV HYP B   53  PRO  4-HYDROXYPROLINE
MODRES 1BKV HYP B   56  PRO  4-HYDROXYPROLINE
MODRES 1BKV HYP B   59  PRO  4-HYDROXYPROLINE
MODRES 1BKV HYP C   62  PRO  4-HYDROXYPROLINE
MODRES 1BKV HYP C   65  PRO  4-HYDROXYPROLINE
MODRES 1BKV HYP C   68  PRO  4-HYDROXYPROLINE
MODRES 1BKV HYP C   80  PRO  4-HYDROXYPROLINE
MODRES 1BKV HYP C   83  PRO  4-HYDROXYPROLINE
MODRES 1BKV HYP C   86  PRO  4-HYDROXYPROLINE
MODRES 1BKV HYP C   89  PRO  4-HYDROXYPROLINE..

ProDy replaces them ALL by normal Proline, that is error obviously

The text was updated successfully, but these errors were encountered:

karolamik13 · 2024-09-09T19:35:14Z

Hello,
It will not solve the problem with SEQRES, but a few days ago, I wrote a function in ProDy that might be useful for you in detecting non-standard residues (see below). I can add an additional option to display residue numbers as well. It will be available soon in ProDy in the GitHub version once it is accepted by my colleagues (#1948).

In [19]: p = parsePDB('1BKV')
@> PDB file is found in working directory (1bkv.pdb.gz).
@> 692 atoms and 1 coordinate set(s) were parsed in 0.02s.

In [20]: checkNonstandardResidues(p.select('name CA'))
@> There are several non-standard residues in the structure.
@> Replace the non-standard name in the PDB file with the equivalent name from the standard one if you want to include them in the interactions.
@> Residues: HYP HYP HYP HYP HYP HYP HYP HYP HYP HYP HYP HYP HYP HYP HYP HYP HYP HYP HYP HYP HYP.

karolamik13 · 2024-09-09T19:45:39Z

Actually, I did it already. It is useful to have resid as well.

In [1]: from prody import *

In [2]: p = parsePDB('1BKV')
@> PDB file is found in working directory (1bkv.pdb.gz).
@> 692 atoms and 1 coordinate set(s) were parsed in 0.01s.

In [3]: checkNonstandardResidues(p.select('name CA'))
@> There are several non-standard residues in the structure.
@> Replace the non-standard name in the PDB file with the equivalent name from the standard one if you want to include them in the interactions.
@> Residues: HYP2 HYP5 HYP8 HYP20 HYP23 HYP26 HYP29 HYP32 HYP35 HYP38 HYP50 HYP53 HYP56 HYP59 HYP62 HYP65 HYP68 HYP80 HYP83 HYP86 HYP89.

jamesmkrieger · 2024-09-10T07:32:43Z

I think we probably should be able to parse the seqres and return the 3 character version, but .getSequence is probably not the answer.

In the 1 letter version, we do alias them as the nearest standard amino acid and also there’s a small annoyance that you get the sequence repeated across for every atom unless you use a Chain object or select ca atoms.

We’ll have a look but think this shouldn’t be too hard

jamesmkrieger · 2024-09-10T07:45:50Z

For what it's worth, we do have this:

In [11]: polys = parsePDBHeader('1bkv', 'polymers')
@> PDB file is found in working directory (1bkv.pdb).

In [12]: polys[0].modified
Out[12]: 
[('HYP', 'A', '2', 'PRO', '4-HYDROXYPROLINE'),
 ('HYP', 'A', '5', 'PRO', '4-HYDROXYPROLINE'),
 ('HYP', 'A', '8', 'PRO', '4-HYDROXYPROLINE'),
 ('HYP', 'A', '20', 'PRO', '4-HYDROXYPROLINE'),
 ('HYP', 'A', '23', 'PRO', '4-HYDROXYPROLINE'),
 ('HYP', 'A', '26', 'PRO', '4-HYDROXYPROLINE'),
 ('HYP', 'A', '29', 'PRO', '4-HYDROXYPROLINE')]

jamesmkrieger · 2024-09-10T08:18:42Z

ok, I have a solution at a new pull request that I'm about to make. It will give you a new keyword argument threeLetter than you can use to get your desired behaviour:

In [1]: from prody import *

In [2]: polys = parsePDBHeader('1bkv', 'polymers', threeLetter=True)
@> PDB file is found in working directory (1bkv.pdb).

In [3]: for prot in polys:
   ...:     SEQRES=prot.sequence
   ...:     print (SEQRES)
   ...: 
PRO HYP GLY PRO HYP GLY PRO HYP GLY ILE THR GLY ALA ARG GLY LEU ALA GLY PRO HYP GLY PRO HYP GLY PRO HYP GLY PRO HYP GLY
PRO HYP GLY PRO HYP GLY PRO HYP GLY ILE THR GLY ALA ARG GLY LEU ALA GLY PRO HYP GLY PRO HYP GLY PRO HYP GLY PRO HYP GLY
PRO HYP GLY PRO HYP GLY PRO HYP GLY ILE THR GLY ALA ARG GLY LEU ALA GLY PRO HYP GLY PRO HYP GLY PRO HYP GLY PRO HYP GLY

In [4]: ag = parsePDB('1bkv', compressed=False)
@> PDB file is found in working directory (1bkv.pdb).
@> 692 atoms and 1 coordinate set(s) were parsed in 0.01s.

In [5]: for chain in ag.iterChains():
   ...:     print(chain.getSequence(threeLetter=True))
   ...: 
GLY PRO GLY PRO GLY ILE THR GLY ALA ARG GLY LEU ALA GLY PRO GLY PRO GLY PRO GLY PRO GLY
PRO GLY PRO GLY PRO GLY ILE THR GLY ALA ARG GLY LEU ALA GLY PRO GLY PRO GLY PRO GLY PRO GLY
PRO GLY PRO GLY PRO GLY ILE THR GLY ALA ARG GLY LEU ALA GLY PRO GLY PRO GLY PRO GLY PRO GLY

In [6]: ag.ca.getSequence(threeLetter=True)
Out[6]: 'GLY PRO GLY PRO GLY ILE THR GLY ALA ARG GLY LEU ALA GLY PRO GLY PRO GLY PRO GLY PRO GLY PRO GLY PRO GLY PRO GLY ILE THR GLY ALA ARG GLY LEU ALA GLY PRO GLY PRO GLY PRO GLY PRO GLY PRO GLY PRO GLY PRO GLY ILE THR GLY ALA ARG GLY LEU ALA GLY PRO GLY PRO GLY PRO GLY PRO GLY'

jamesmkrieger · 2024-09-10T08:48:31Z

ok, I think this is now essentially ready. Please feel free to look over the pull request and let us know if we can add anything else and to check out the branch jamesmkrieger:seqres_threeLetter and try it.

jamesmkrieger · 2024-10-09T09:18:14Z

@lokapal please let us know if these comments and changes meet your requirements

lokapal · 2024-10-15T23:24:51Z

All is fixed right now, as it seems to me. Thank you very much!

jamesmkrieger · 2024-10-16T07:08:20Z

You’re welcome

jamesmkrieger mentioned this issue Sep 10, 2024

add threeLetter SEQRES #1950

Open

lokapal closed this as completed Oct 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to get original SEQRES? #1949

How to get original SEQRES? #1949

lokapal commented Sep 9, 2024 •

edited

Loading

karolamik13 commented Sep 9, 2024

karolamik13 commented Sep 9, 2024

jamesmkrieger commented Sep 10, 2024

jamesmkrieger commented Sep 10, 2024

jamesmkrieger commented Sep 10, 2024

jamesmkrieger commented Sep 10, 2024

jamesmkrieger commented Oct 9, 2024

lokapal commented Oct 15, 2024

jamesmkrieger commented Oct 16, 2024

How to get original SEQRES? #1949

How to get original SEQRES? #1949

Comments

lokapal commented Sep 9, 2024 • edited Loading

karolamik13 commented Sep 9, 2024

karolamik13 commented Sep 9, 2024

jamesmkrieger commented Sep 10, 2024

jamesmkrieger commented Sep 10, 2024

jamesmkrieger commented Sep 10, 2024

jamesmkrieger commented Sep 10, 2024

jamesmkrieger commented Oct 9, 2024

lokapal commented Oct 15, 2024

jamesmkrieger commented Oct 16, 2024

lokapal commented Sep 9, 2024 •

edited

Loading