Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add threeLetter SEQRES #1950

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

jamesmkrieger
Copy link
Contributor

fixes #1949

So far this works for pdb headers and atomic object only. I'm still working on the cif header part

What we can do so far is the following:

In [1]: from prody import *

In [2]: polys = parsePDBHeader('1bkv', 'polymers', threeLetter=True)
@> PDB file is found in working directory (1bkv.pdb).

In [3]: for prot in polys:
   ...:     SEQRES=prot.sequence
   ...:     print (SEQRES)
   ...: 
PRO HYP GLY PRO HYP GLY PRO HYP GLY ILE THR GLY ALA ARG GLY LEU ALA GLY PRO HYP GLY PRO HYP GLY PRO HYP GLY PRO HYP GLY
PRO HYP GLY PRO HYP GLY PRO HYP GLY ILE THR GLY ALA ARG GLY LEU ALA GLY PRO HYP GLY PRO HYP GLY PRO HYP GLY PRO HYP GLY
PRO HYP GLY PRO HYP GLY PRO HYP GLY ILE THR GLY ALA ARG GLY LEU ALA GLY PRO HYP GLY PRO HYP GLY PRO HYP GLY PRO HYP GLY

In [4]: ag = parsePDB('1bkv', compressed=False)
@> PDB file is found in working directory (1bkv.pdb).
@> 692 atoms and 1 coordinate set(s) were parsed in 0.01s.

In [5]: for chain in ag.iterChains():
   ...:     print(chain.getSequence(threeLetter=True))
   ...: 
GLY PRO GLY PRO GLY ILE THR GLY ALA ARG GLY LEU ALA GLY PRO GLY PRO GLY PRO GLY PRO GLY
PRO GLY PRO GLY PRO GLY ILE THR GLY ALA ARG GLY LEU ALA GLY PRO GLY PRO GLY PRO GLY PRO GLY
PRO GLY PRO GLY PRO GLY ILE THR GLY ALA ARG GLY LEU ALA GLY PRO GLY PRO GLY PRO GLY PRO GLY

In [6]: ag.ca.getSequence(threeLetter=True)
Out[6]: 'GLY PRO GLY PRO GLY ILE THR GLY ALA ARG GLY LEU ALA GLY PRO GLY PRO GLY PRO GLY PRO GLY PRO GLY PRO GLY PRO GLY ILE THR GLY ALA ARG GLY LEU ALA GLY PRO GLY PRO GLY PRO GLY PRO GLY PRO GLY PRO GLY PRO GLY ILE THR GLY ALA ARG GLY LEU ALA GLY PRO GLY PRO GLY PRO GLY PRO GLY'

@jamesmkrieger
Copy link
Contributor Author

ok, now the CIF version is working too:

In [1]: from prody import *

In [2]: polys = parseCIFHeader('1bkv', 'polymers', threeLetter=True)
@> CIF file is found in working directory (1bkv.cif).

In [3]: for prot in polys:
   ...:     SEQRES=prot.sequence
   ...:     print (SEQRES)
   ...: 
PRO HYP GLY PRO HYP GLY PRO HYP GLY ILE THR GLY ALA ARG GLY LEU ALA GLY PRO HYP GLY PRO HYP GLY PRO HYP GLY PRO HYP GLY
PRO HYP GLY PRO HYP GLY PRO HYP GLY ILE THR GLY ALA ARG GLY LEU ALA GLY PRO HYP GLY PRO HYP GLY PRO HYP GLY PRO HYP GLY
PRO HYP GLY PRO HYP GLY PRO HYP GLY ILE THR GLY ALA ARG GLY LEU ALA GLY PRO HYP GLY PRO HYP GLY PRO HYP GLY PRO HYP GLY

@jamesmkrieger
Copy link
Contributor Author

This now works with chain getSequence with allres=True too:

In [1]: from prody import *

In [2]: ag = parsePDB('1bkv', compressed=False)
@> PDB file is found in working directory (1bkv.pdb).
@> 692 atoms and 1 coordinate set(s) were parsed in 0.01s.

In [3]: for chain in ag.iterChains():
   ...:     print(chain.getSequence(threeLetter=True, allres=True))
   ...: 
HYP GLY PRO HYP GLY PRO HYP GLY ILE THR GLY ALA ARG GLY LEU ALA GLY PRO HYP GLY PRO HYP GLY PRO HYP GLY PRO HYP GLY HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH
PRO HYP GLY PRO HYP GLY PRO HYP GLY ILE THR GLY ALA ARG GLY LEU ALA GLY PRO HYP GLY PRO HYP GLY PRO HYP GLY PRO HYP GLY ACY ACY ACY ACY HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH
PRO HYP GLY PRO HYP GLY PRO HYP GLY ILE THR GLY ALA ARG GLY LEU ALA GLY PRO HYP GLY PRO HYP GLY PRO HYP GLY PRO HYP GLY ACY HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH

Copy link
Contributor

@karolamik13 karolamik13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works fine. Check.

In [1]: from prody import *

In [2]: ag = parsePDB('1bkv', compressed=False)
@> Connecting wwPDB FTP server RCSB PDB (USA).
@> 1bkv downloaded (1bkv.pdb)
@> PDB download via FTP completed (1 downloaded, 0 failed).
@> 692 atoms and 1 coordinate set(s) were parsed in 0.02s.

In [3]: for chain in ag.iterChains():
...: print(chain.getSequence(threeLetter=True, allres=True))
...:
HYP GLY PRO HYP GLY PRO HYP GLY ILE THR GLY ALA ARG GLY LEU ALA GLY PRO HYP GLY PRO HYP GLY PRO HYP GLY PRO HYP GLY HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH
PRO HYP GLY PRO HYP GLY PRO HYP GLY ILE THR GLY ALA ARG GLY LEU ALA GLY PRO HYP GLY PRO HYP GLY PRO HYP GLY PRO HYP GLY ACY ACY ACY ACY HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH
PRO HYP GLY PRO HYP GLY PRO HYP GLY ILE THR GLY ALA ARG GLY LEU ALA GLY PRO HYP GLY PRO HYP GLY PRO HYP GLY PRO HYP GLY ACY HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH HOH

In [4]: for chain in ag.iterChains():
...: print(chain.getSequence(threeLetter=True))
...:
GLY PRO GLY PRO GLY ILE THR GLY ALA ARG GLY LEU ALA GLY PRO GLY PRO GLY PRO GLY PRO GLY
PRO GLY PRO GLY PRO GLY ILE THR GLY ALA ARG GLY LEU ALA GLY PRO GLY PRO GLY PRO GLY PRO GLY
PRO GLY PRO GLY PRO GLY ILE THR GLY ALA ARG GLY LEU ALA GLY PRO GLY PRO GLY PRO GLY PRO GLY

In [5]: polys = parseCIFHeader('1bkv', 'polymers', threeLetter=True)
@> Connecting wwPDB FTP server RCSB PDB (USA).
@> 1bkv downloaded (1bkv.cif)
@> PDB download via FTP completed (1 downloaded, 0 failed).

In [6]: for prot in polys:
...: ...: SEQRES=prot.sequence
...: ...: print (SEQRES)
...:
PRO HYP GLY PRO HYP GLY PRO HYP GLY ILE THR GLY ALA ARG GLY LEU ALA GLY PRO HYP GLY PRO HYP GLY PRO HYP GLY PRO HYP GLY
PRO HYP GLY PRO HYP GLY PRO HYP GLY ILE THR GLY ALA ARG GLY LEU ALA GLY PRO HYP GLY PRO HYP GLY PRO HYP GLY PRO HYP GLY
PRO HYP GLY PRO HYP GLY PRO HYP GLY ILE THR GLY ALA ARG GLY LEU ALA GLY PRO HYP GLY PRO HYP GLY PRO HYP GLY PRO HYP GLY

In [7]: ag.ca.getSequence(threeLetter=True)
Out[7]: 'GLY PRO GLY PRO GLY ILE THR GLY ALA ARG GLY LEU ALA GLY PRO GLY PRO GLY PRO GLY PRO GLY PRO GLY PRO GLY PRO GLY ILE THR GLY ALA ARG GLY LEU ALA GLY PRO GLY PRO GLY PRO GLY PRO GLY PRO GLY PRO GLY PRO GLY ILE THR GLY ALA ARG GLY LEU ALA GLY PRO GLY PRO GLY PRO GLY PRO GLY'

@jamesmkrieger
Copy link
Contributor Author

Actually, I should probably call it something different and I think I still need to add it to docs and make some tests

It should hopefully work for 4 letter resnames like TIP3 or HISE if we parse them like that, but I need to check

@jamesmkrieger jamesmkrieger marked this pull request as draft September 12, 2024 13:54
@jamesmkrieger
Copy link
Contributor Author

ok, this doesn't completely work for 4 characters like HISE, because they don't get selected in chain.calpha

@jamesmkrieger jamesmkrieger marked this pull request as ready for review September 12, 2024 14:11
@jamesmkrieger
Copy link
Contributor Author

ok, this doesn't completely work for 4 characters like HISE, because they don't get selected in chain.calpha

now it works after adding HISE and various others from the InSty list to NONSTANDARD in atomic.flags

In [2]: ag = parsePDB('nvt1_nojump_end.pdb', long_resname=True)

In [3]: ag.getSequence(longSeq=True)[1152:1165]
Out[3]: 'GLU GLU HISE '

In [4]: ag.ca.getSequence(longSeq=True)[60:80]
Out[4]: 'THR VAL GLN GLU HISE'

In [5]: chain = ag['A']

In [6]: chain.getSequence(longSeq=True)[60:80]
Out[6]: 'THR VAL GLN GLU HISE'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

How to get original SEQRES?
2 participants