DeepPCT

Introduction

DeepPCT is a deep learning algorithm to identify PTM crosstalk within proteins using AlphaFold2-based structures. In this algorithm, one deep learning classifier was built for sequence-based prediction by combining the residue and residue pair embeddings with cross-attention techniques, while the other classifier was established for structure-based prediction by integrating the structural embedding and a graph neural network. Meanwhile, a machine learning classifier was established using a series of novel structural descriptors and a random forest model to complement the structural deep learning classifier. Finally, the three classifiers were merged to maximize their complementarity through a weighted combination strategy.

Key Advantages

Improved Performance: DeepPCT outperforms SOTA methods across different evaluation scenarios.
Fast: DeepPCT offers 60x faster inference speed than our previous model (PCTpred).
General: Trained on AlphaFold2 predicted structures, DeepPCT can be applied to any protein when only its sequence is available.

Usage

Installation

Clone the repository to your local device.
```
git clone https://github.com/hzau-liulab/DeepPCT
cd DeepPCT
```
Please note that this program is developed and tested on Linux CentOS 7. It is likely compatible with other Linux distributions. Windows is not supported due to a third-party package relies on the fork system call, unavailable on that OS.
Install the necessary dependencies.
DeepPCT requires the following dependencies:
- Software
```
Python  3.9
GHECOM  (Last modified: 2021/12/01)
```
- Python packages
```
PyTorch                 1.13.0
DGL                     1.1.0
NumPy                   1.23.5
SciPy                   1.10.1
GraphRicciCurvature     0.5.3.1
RDKit                   2022.9.3
NetworkX                2.7.1
scikit-learn            1.2.2
torchdrug               0.2.0.post1
Biopython               1.78
fair-esm                2.0.0
safetensors             0.3.1
```
To avoid potential conflicts, we recommend creating a new conda environment for DeepPCT, and then install the required packages within this environment.
```
conda create -n DeepPCT python=3.9.12
conda activate DeepPCT
```
For convenience, we provide an installation script to install these required Python packages.
```
source installation_script.sh
```
GHECOM requires users to provide their personal details for download. Please access GHECOM website to download its source code. Follow the provided instructions to compile the code. Once compiled, transfer the GHECOM executable file (ghecom) to the software/ghecom directory, and then run the following command to set execution permissions:
```
chmod +x software/ghecom/ghecom
```
Download the pre-trained model weights
Our pre-trained model weights are available at Google Drive. Three additional pre-trained models are required for this program:
- ESM-2: esm2_t33_650M_UR50D.pt
- ESM-2 regression: esm2_t33_650M_UR50D-contact-regression.pt
- GearNet-Edge: mc_gearnet_edge.pth
Download and move these weights to the model_weights folder.

Run prediction

Prepare input sequences
Each input sequence should be saved in a separate FASTA file named seq_id.fasta, where the prefix seq_id will be used as an identifier of the sequence. Please place all the input FASTA files in the input/FASTA directory.
Prepare AlphaFold2 predicted structures
For most of the existing proteins, you can directly download the predicted structures from AlphaFold DB. If the predicted structures are not available, you can use the following resources to predict the structures for your proteins sequence:
- ColabFold (recommended)
- AlphaFold2 on Colab (from DeepMind)
Alternatively, you can install AlphaFold2 and run the prediction locally. The predicted structures should be saved in the PDB format. The filename should be seq_id.pdb, where the seq_id should match the identifier of the sequence. Please place all the predicted structures in the input/PDB directory.
Prepare input file
Format each line of the input file as following:
```
seq_id site1  site2
```
where seq_id is the identifier of the sequence, and site1 and site2 are the positions of the two PTM sites. For instance:
```
P48431	S251	K245
P48431	K245	S249
P48431	Y277	S249
P56693	T240	K55
P56693	T244	K55
```

Run the prediction
With the input sequence, structure, and file ready, use the following command to run the prediction:

python predict.py -i path/to/input/file -o path/to/output/directory

By default, the output will be saved in TXT format. For easier parsing, you can use the --jsonl option to save the output in JSONL format.

python predict.py -i path/to/input/file -o path/to/output/directory --jsonl

Output example(TXT):

P48431	S251	K245	prediction_score	0.678	prediction_result	Positive
P48431	K245	S249	prediction_score	0.065	prediction_result	Negative
P48431	Y277	S249	prediction_score	0.038	prediction_result	Negative
P56693	T240	K55	prediction_score	0.296	prediction_result	Positive
P56693	T244	K55	prediction_score	0.224	prediction_result	Positive

Output example(JSONL):

{"seq_id": "P48431", "sites": [{"site": ["S251", "K245"], "prediction_score": 0.6783167167425156, "prediction_result": "Positive"}, {"site": ["K245", "S249"], "prediction_score": 0.06511083447009325, "prediction_result": "Negative"}, {"site": ["Y277", "S249"], "prediction_score": 0.03774637453317642, "prediction_result": "Negative"}]}
{"seq_id": "P56693", "sites": [{"site": ["T240", "K55"], "prediction_score": 0.29571153268337247, "prediction_result": "Positive"}, {"site": ["T244", "K55"], "prediction_score": 0.22387358837008475, "prediction_result": "Positive"}]}

Citation

@article{huang2024deeppct,
    author = {Huang, Yu-Xiang and Liu, Rong},
    title = {Improved prediction of post-translational modification crosstalk within proteins using DeepPCT},
    year = {2024},
    doi = {10.1093/bioinformatics/btae675},
    url = {https://doi.org/10.1093/bioinformatics/btae675},
    journal = {Bioinformatics}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

DeepPCT

Introduction

Key Advantages

Usage

Installation

Run prediction

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

DeepPCT

Introduction

Key Advantages

Usage

Installation

Run prediction

Citation