A Python 3 protein structure fetcher. Retrieves the cif or pdb files from either the RCSB Protein Data Bank (PDB, using pypdb) or Alphafold using the Uniprot ID.
Please install the latest version of PyPDB using:
pip install pypdb
or
pip install git+git://github.com/williamgilpin/pypdb
Install profet
using pip:
pip install profet
To install the development version, which contains the latest features and fixes, install directly from GitHub using:
pip install git+git://github.com/alan-turing-institute/profet
To test the installation, you need to have pytest and pytest-cov packagages installed which can be done as follows.
pip install pytest pytest-cov
Then navigate to the root directory of the package and run
pytest
This code has been designed and tested for Python 3.
This package can be used to retrieve the available protein structure from any Uniprot ID. It can also be used to automatically delete signal peptides off the structure.
The Fetcher
class can search the IDs in both PDB and Alphafold, and saves the search results in a dictionary.
get_file
returns the structure corresponding to uniprot_id
in the defined filetype:
(default as 'pdb'
, option as 'cif'
), searching first in the defaulted database db
(default as 'pdb'
, option as 'alphafold'
).
The files can be saved to a local file with filesave
: the files are saved as uniprotID.<filetype>
, except when the files are fetched from PDB and, in that case, are saved as uniprotID_pdbID.<filetype>
.
set_default_db
changes the default database into the given one between 'pdb'
and 'alphafold'
.
set_directory
changes the directory where the files are saved. Files save as <directory>/<id>.<filetype>
.
Run search_history()
to see the search history of the fetcher.
import profet as pf
fetcher = pf.Fetcher()
fetcher.set_directory("/path/to/directory/folder")
fetcher.get_file(uniprot_id = "P61316", filetype = "pdb", filesave = True, db = "alphafold")
fetcher.search_history()
returns:
{'P61316': ['pdb', 'alphafold']}
Loads profet
and the file-fetcher, then specifies a directory to save the files at.
Lastly, downloads the protein with uniprod ID "P61316", in pdb format from the Alphafold databank and saves it in the specified directory.
For more detailed examples consult the following Python notebook.
Once a structure is downloaded using get_file
, the signal cleaving function cleave_off_signal_peptides
from the Fetcher
class, compares the sequence of the structure to the UniProt database for any signal peptides included in the structure. It then automatically deletes the signal peptides from the structure.
The cleaved structure is saved as a separate file, with the deleted residue positions added to the filename. In the case of no signal peptides being detected, as new file named "structure-ID_None.cif/.pdb" will be saved.
import profet as pf
fetcher = pf.Fetcher()
fetcher.set_directory("/path/to/directory/folder")
fetcher.get_file(uniprot_id = "P0A855", filetype = "pdb", filesave = True, db = "alphafold")
fetcher.cleave_off_signal_peptides("P0A855")
This will save p0a855.pdb and p0a855_cleaved_1to21.pdb to the specified directory.
The profet
library also has a command line interface that mirrors the python
API and which can be used to download entries from both the PDB and AlphaFold.
An example of how to use the profet command line program is shown in the
following code snippet.
profet 4v1w \
--filetype=pdb \
--main_db=pdb \
--save_directory="~/.pdb"
In this example, the entry "4V1W" is to be downloaded from the PDB database as a .pdb file. The file will be cached in the "~/.pdb" directory for future use.
You can find more documentation including a description of the python api here.
If you run into an issue, or if you find a workaround for an existing issue, we would very much appreciate it if you could post your question or code as a GitHub issue.
If you would like to help contribute to profet
, please read our contribution guide and code of conduct.